OpenAI’s new AI-generated video surprised many people

OpenAI

OpenAI’s latest venture into AI may be its most impressive venture to date. Dubbed “Sora,” this new text-to-video AI model has just opened its doors to a limited number of users who can test it out. The company launched it by showing several videos created entirely by AI, and the end result is very realistic.

OpenAI introduced Sora by saying that it can create realistic scenes based on text commands, and a video shared on its website proves it. The instructions are descriptive, but brief; I personally use longer commands just to interact with ChatGPT. For example, to create a woolly mammoth video like the one pictured above, Sora needed a 67-word command that described the animal, its surroundings, and camera placement.

“Sora can produce videos up to a minute long while maintaining visual quality and compliance with user commands,” OpenAI said in a statement. announcement. AI can generate complex scenes filled with many characters, scenes, and accurate movements. To that end, OpenAI says that Sora predicts and reads between the lines as needed.

“The model understands not only what the user is asking for in the prompt, but also how those things exist in the physical world,” OpenAI said. The models not only feature characters, clothing or backgrounds, but also create “compelling characters that express vivid emotions.”

Sora can also fill in gaps in existing videos or make them longer, as well as produce videos based on images, so not everything is just text.

Introducing Sora, our text-to-video model.

Sora can create videos up to 60 seconds long featuring highly detailed scenes, complex camera movements, and multiple characters with vivid emotions.

Prompt: “Beautiful, snowy… pic.twitter.com/ruTEWn87vf

— OpenAI (@OpenAI) February 15, 2024

While the video looks great as a still image taken from a screenshot, the movement is simply stunning. OpenAI presents a variety of videos to show off the new technology, including the Cyberpunk-like streets of Tokyo and “historical footage” of California during the Gold Rush. There’s more, including a close-up of a human eye. The instructions cover anything from cartoons to wildlife photography.

Sora still made some mistakes. A closer look reveals that, for example, some of the figures in the crowd have no heads or move strangely. Awkward movements are obvious at first glance in some samples, but general awkwardness requires several observations to recognize.

It may be a while before OpenAI opens up Sora to the general public. Currently, the model will be tested by a red team that will assess potential risks. Some content creators will also start testing it now, while it is still in the early stages of development.

The AI ​​is still not perfect, so I was expecting something pretty messy. Whether it was because of the low expectations or Sora’s abilities, I was impressed, but also a little worried. We already live in a world where it’s hard to differentiate between what’s fake and what’s real, and now, it’s not just images that are in danger, but so are videos. However, Sora isn’t the first text-to-video model we’ve seen, like Pika.

If OpenAI’s Sora is this good now, it’s hard to imagine what it will be capable of after several years of further development and testing. This technology has the potential to replace many jobs — but, hopefully, like ChatGPT, it will be able to coexist with human professionals.

Editor’s Recommendations






Leave a Comment