Google introduces Gemini Omni, a multimodal AI for video creation from various inputs

May 19, 2026

AI Summary

Google has unveiled Gemini Omni, a new multimodal AI model capable of generating videos from images, audio, and text. This technology aims to simplify video creation for consumers and has potential applications in advertising and filmmaking, with future enhancements planned for longer video capabilities.

Google launched Gemini Omni, a multimodal AI model, at the Google I/O developer conference. The model can create videos by combining images, audio, and text, producing coherent outputs that reflect an understanding of various subjects.
Users can edit photos using plain text commands, making the process more accessible. The first model, Gemini Omni Flash, will allow users to create 10-second videos and is designed for consumer use.
To prevent misuse, users must undergo a verification process to create digital avatars, and all videos will include a digital watermark for authenticity.
Google plans to make Omni available via API, targeting both consumer and enterprise markets, with potential applications for content creators and advertisers.
Future developments will include longer video capabilities and a more advanced Omni Pro model, which is expected to outperform the Flash version.

googlegemini omnimultimodalvideo generationai technology

Google introduces Gemini Omni, a multimodal AI for video creation from various inputs

Related Stories

Satirical Ads Mimic Tech Campaigns in Public Transit

Malaysia's Prime Minister Anwar Ibrahim to Introduce AI Avatar for Public Engagement

Character.AI launches AI-generated microdramas with interactive features for users

JioStar Utilizes AI to Enhance User Engagement in Streaming Services