Notes from the Studio: Why Jaws are Dropping Over SORA AI

Remember that AI-generated video of Will Smith eating spaghetti that took over the internet? Or what about that artificial Pepperoni Hug Spot pizza commercial? If you’ve seen them, there’s no way you could forget them. While very entertaining and also pretty disturbing, they were a preview of what Video Generative AIs were attempting to achieve. But anything actually believable felt a long way off.

That was just a short year ago. Now, SORA AI generative videos are more like this. Pretty incredible, right? So how far has the progress of Generative AI really come? What’s still on the horizon?

As Assemble’s Studio Manager, keeping a pulse on this headway is what I do — and what I love doing. So in this Notes from the Studio update, we’re taking a look at how far tools like SORA have come, how creative teams can use it, and what other technology is trying to challenge it.

What is SORA AI?

While the underlying technology behind SORA AI is still a mystery to even most AI experts due to OpenAI’s tight lips on the topic, what we do know is that this text-to-video AI model is capable of some undeniably realistic videos. The company is teaching the tech to understand and simulate the physical world in motion, using videos that were both publicly available and licensed from copyright owners to train it.

While the company has been sued multiple times for alleged copyright infringement across its collection of AI generative tools, the relentless pursuit to fine tune its tech continues — and it’s making serious waves. Just a year ago, similar tools were creating wildly distorted and bizarre videos, with the allure similar to a train wreck. And now, SORA is “wowing” experts and redefining expectations for what’s possible in video production with AI.

Currently, SORA remains in beta, only accessible to red team members employed to assess potential risks and issues. But OpenAI claims anyone will soon be able to get their hands on the tool to easily produce entire video clips.

What’s it Capable of?

Through the instruction of simple text prompts, SORA AI can create videos up to 60 seconds long. It’s said to be capable of high-definition scenes with intricate camera motion and character movements through a diffusion model, gradually removing static noise over several steps to fine tune the output. When compared to OpenAI’s generative image technology, DALLE, experts are putting it in a totally different category. Instead of rooted in creativity, SORA is scientific and data-driven, “learning intricate rendering, ‘intuitive’ physics, long-horizon reasoning, and semantic grounding.”

But even OpenAI admits its technology isn’t perfect. And you can see it — if you look closely. In its current state, SORA “struggles with accurately simulating the physics of a complex scene.” While the technology appears much more advanced for landscapes and scenery, it’s still working to master more natural, mechanical, or technical motions.

For example, take a look at this Big Sur Drone Footage. Then look at this Grandma’s Birthday Scene. While at first glance both could be mistaken as real videos, a deeper inspection is a bit more telling. The landscape is far less flawed. Look again at the grandmother, and you’ll see her movements aren’t quite natural — almost Jack Sparrow-esque. If she was trying to blow out the candles, she completely missed. Her hands are odd and constantly changing (AI’s biggest weakness). The direction of the wind on the candle flames is very inconsistent.

At Assemble, we’ve been able to create some incredible outdoor landscapes and 3D product renders that even a highly trained eye could not tell weren’t real. But we’ve noticed the same struggle when it comes to human anatomy or technical mechanisms while testing image generation for our brands. Rendering complex musical instruments like a sax or a piano with multiple components continues to be inaccurate. Maybe those without a musical background may not be as keen to notice, but the errors are there — for now.

What are the Risks?

As of right now, deep fake videos remain the top concern as these video productions get more and more realistic. With the existing power of voice AI software, the potential for video impersonation is more prevalent than ever — especially politically-motivated. Generative AI as a whole opens the door to potential malicious misuse when in the wrong hands, with the potential to create misleading or fabricated content almost impossible to detect for the untrained eye. Even further, if these AI systems are not secure, they can be the desired target of cyberattacks and accidental leaks. Being extremely mindful of these risks and aligning your teams on a plan to prevent them will be crucial to reaping the benefits, and not the dangers.

Who are the Top Competitors Right Now?

  1. Colossyan Best for collaborative video production. Designed similar to project management tools, Colossyan uses a model where admins can create multiple workspaces for their teams. You can access prebuilt project templates, while adding subtitles and captions, integrating screen captures, incorporating storylines, and saving brand kits for consistency. The downside: limited template options and issues with lip-syncing.

  2. Synthesia Best for AI avatars. This technology transforms text scripts and instructions into videos, giving users the over 140 high-quality AI avatars to choose from, or the ability to create their own. You can also generate audio in more than 120 languages and voices and customize background and branding, adding soundtrack music, and inserting markers to create simple animations. The downside: the larger the screen, the less believable it is.

  3. Stable Video Diffusion Best for diverse video applications. Stable Diffusion generates high-quality designs that surpass a lot of traditional design methods through next-level control and predictability throughout the process. It can stimulate new ideas and enhance specific features of images, even those with low-quality sources. The downside: it is computationally intensive and isn’t ideal for certain types of images and editing tasks.

  4. InVideo Best for social media and content marketing videos. InVideo converts simple text prompts into video scripts, scenes, and voiceovers. Its custom template library is one of the most extensive in the market (4,000+) with over 1 million royalty-free images and video, including topics and formats for advertising, memes, logo videos, YouTube, Instagram, and more. The downside: video rendering is slower and audio editing options are more limited than most.

  5. Runway Best for ease of use and accommodating diverse data types. It has more than 30 creative tools powered by AI with an array of pre-trained models to effortlessly generate text for videos and images. Runway also has online editing capabilities up to advanced post-processing tasks, as well as a research division that develops multimodal AI systems which enable new types of creative tools. The downside: it has a higher risk of error and misinterpretation of data.

How are Creative Teams Using it?

Assemble can confidently say we expect to make use of the generative tools to develop assets in a more cost-effective way very soon. We’re starting to have those conversations about how we can tap into the power of generative AI to save time and money when producing hundreds of assets for our brands’ campaigns. For now, we’re exploring a hybrid between real stock photos and AI-generated photos to help move the process along more quickly. Our graphic designers are working to be as efficient as possible in generating prompts that offer the most accurate results, and then manually making adjustments where needed. Instead of starting from scratch, or spending time searching for just the right photo or video, we’ll be taking advantage of how tools like SORA AI can expedite the process and eliminate the intimidation of a blank canvas.

That being said, we’ll always be big proponents that the creativity and authenticity of the human touch can never completely be replaced. But as these generative AI tools continue to evolve, they will certainly be game-changing in terms of efficiency.

To chat with us about what’s to come in Generative AI Videos and how you can take advantage of it, contact our experts today.