Generative AI
1h ago
Google introduces Gemini Omni, a multimodal AI for video creation from various inputs
May 19, 2026
AI Summary
Google has unveiled Gemini Omni, a new multimodal AI model capable of generating videos from images, audio, and text. This technology aims to simplify video creation for consumers and has potential applications in advertising and filmmaking, with future enhancements planned for longer video capabilities.
- Google launched Gemini Omni, a multimodal AI model, at the Google I/O developer conference. The model can create videos by combining images, audio, and text, producing coherent outputs that reflect an understanding of various subjects.
- Users can edit photos using plain text commands, making the process more accessible. The first model, Gemini Omni Flash, will allow users to create 10-second videos and is designed for consumer use.
- To prevent misuse, users must undergo a verification process to create digital avatars, and all videos will include a digital watermark for authenticity.
- Google plans to make Omni available via API, targeting both consumer and enterprise markets, with potential applications for content creators and advertisers.
- Future developments will include longer video capabilities and a more advanced Omni Pro model, which is expected to outperform the Flash version.
googlegemini omnimultimodalvideo generationai technology