OpenAI, known for ChatGPT, unveiled its debut AI-driven text-to-video model, named Sora, on Thursday. According to the company, Sora can produce videos lasting up to one minute.
Sora surpasses competitors like Google’s Lumiere by offering longer video generation capabilities. It’s currently accessible to red teamers—cybersecurity professionals who rigorously test software—and select content creators.
In the future, OpenAI intends to integrate Coalition for Content Provenance and Authenticity (C2PA) metadata into its products once Sora is deployed as part of OpenAI’s lineup.
In a post on X (formerly known as Twitter), the company announced the AI video generator, stating, “Sora can create videos of up to 60 seconds featuring highly detailed scenes, complex camera motion, and multiple characters with vibrant emotions.”
Remarkably, Sora’s claimed video length surpasses its competitors by over tenfold. While Google’s Lumiere produces 5-second videos, Runway AI and Pika 1.0 offer even shorter durations with 4-second and 3-second videos, respectively.
OpenAI’s X account, along with CEO Sam Altman, shared numerous videos generated by Sora alongside the prompts that guided their creation.
These resulting videos showcase remarkable detail and fluid motion, a feature that sets them apart from other video generators currently available in the market.
According to the company, Sora can create intricate scenes featuring multiple characters, various camera angles, specific motions, and precise details of both subjects and backgrounds.
This capability stems from the text-to-video model’s utilization of both prompts and a comprehensive understanding of “how these things exist in the physical world.”
Sora operates as a diffusion model, using a transformer architecture similar to GPT models. Likewise, it processes and generates data using “patches,” a concept analogous to tokens in text-based models.
According to the company, patches consist of grouped videos and images packaged in small segments.
OpenAI trained the video generation model using this visual data across various durations, resolutions, and aspect ratios.
Beyond text-to-video generation, Sora possesses the capability to transform a still image into a dynamic video.
OpenAI acknowledged on its website that “The current model has weaknesses. It may struggle with accurately simulating the physics of a complex scene and may not understand specific instances of cause and effect. For example, a person might take a bite out of a cookie, but afterwards, the cookie may not have a bite mark.”
To prevent misuse of the AI tool for creating deceptive content like deepfakes, OpenAI is developing detection tools for identifying misleading content.
Additionally, the company intends to incorporate C2PA metadata into the generated videos, following a similar approach recently implemented for its DALL-E 3 model.
OpenAI is collaborating with red teamers, particularly domain experts specializing in combating misinformation, hateful content, and bias, to enhance the model’s capabilities.
Currently, Sora is accessible exclusively to red teamers and a select group of visual artists, designers, and filmmakers. This limited availability enables OpenAI to gather valuable feedback on the product’s performance and usability.