Google's Gemini Omni: Just Speak, AI Fixes Your Videos

Select Language:

During Google I/O 2026, the tech giant unveiled its latest innovation: the Gemini Omni model. This advanced AI system is capable of processing multiple types of input—text, images, audio, and video—enabling seamless cross-modal generation and editing.

Initially, the audio feature supports only voice input, but Google has assured users that support for additional audio formats will be rolled out soon. The premiere product built on this model, Gemini Omni Flash, has already gone live within the Gemini app, with plans to offer API access to enterprise clients later on.

One of the standout features of Gemini Omni is its sophisticated video editing capability. Users can effortlessly make continuous adjustments through natural language commands—adding or removing objects, changing camera angles, or altering the environmental style. Thanks to the model’s deep understanding of physical laws, along with its integration of historical, scientific, and cultural knowledge, the generated videos maintain high levels of visual coherence and logical consistency. The AI can even predict subsequent plot points and allow users to create digital avatars to be embedded within the videos.

Safety considerations are also a priority. All videos generated with Omni will automatically contain SynthID digital watermarks, which can be verified via Google Search and Chrome, helping to combat misinformation and unauthorized use.

Currently, Gemini Omni Flash is available to users subscribed to Google AI Plus, Pro, or Ultra tiers through the Gemini app and Google Flow. It is also offered for free to users interested in editing YouTube Shorts or experimenting with YouTube Create.

Google DeepMind head Demis Hassabis emphasized that these developments mark a significant step toward evolving AI from task-specific tools into more general artificial intelligence (AGI), pushing the boundaries of what machines can accomplish in creative and analytical domains.