Meet Gemini Omni

Step into the Omni-verse ✨ Meet Gemini Omni, our new model that can create anything from any input, starting with video.

🎸 Swap out backgrounds to place yourself in the clouds

🎨 Apply wild new styles and built-in templates to your footage

🎞️ Keep your characters consistent from scene to scene

Gemini Omni is rolling out globally today. It is available directly in the Gemini app for Plus, Pro, and Ultra subscribers.

#GoogleIO #GoogleGemini

The music featured throughout this video (0:00–0:51) has a high-energy, upbeat, and modern pop aesthetic. It includes rhythmic, percussive elements that align with the lyrics, creating a fast-paced and driving atmosphere. The composition is dynamic, featuring recurring instrumental breaks (0:00, 0:04, 0:12, 0:20, 0:31, 0:41, 0:48) that punctuate the vocals, giving the track a distinct, stylized feel that complements the video’s focus on creative editing and visual transformation.

Gemini Omni is Google’s natively multimodal “anything in, anything out” model family. Unveiled at Google I/O, it marks a major shift by combining Gemini’s advanced reasoning and real-world logic with high-fidelity creative generation—starting primarily with video and video editing.

Rather than running text, images, and audio through separate pipelines bolted together, Omni processes all of these modalities simultaneously.

Core Capabilities of Gemini Omni

Conversational Video Editing: Instead of wrestling with traditional timeline-based editing software, you can modify video footage by simply talking or typing instructions. Because the model retains context across multiple turns of a conversation, you can stack edits incrementally.
Multimodal Input Referencing: You can blend entirely different formats into a single output. For example, you can upload an image of a character, a text description of a setting, and a sketch of an object, and Omni will synthesize them into a cohesive, unified video clip.
Physics-Aware World Modeling: Omni doesn’t just match visual patterns; it understands underlying real-world physics. It actively calculates forces like gravity, kinetic energy, and fluid dynamics so that generated motion looks believable.
Character & Scene Consistency: One of the biggest challenges in generative video has been “hallucinated details” shifting between frames. Omni anchors character identities and environmental details, keeping them visually consistent across cuts and multi-turn edits.
AI Avatars (Beta): Paid subscribers can record a quick selfie video via the input menu to create a digital avatar that captures their exact likeness and voice, allowing them to place themselves seamlessly into generated video content.

Safety & Distribution

To combat deepfakes and AI-generated misinformation, Google DeepMind has built its SynthID digital watermarking directly into Omni. Every video generated contains an imperceptible cryptographic watermark that can be verified via Google Search, Chrome, or the Gemini app. Furthermore, broader public video speech-and-audio editing features are temporarily withheld while safety testing continues.

Availability

Premium Subscribers: The first model of the family—Gemini Omni Flash—is available to Google AI Plus, Pro, and Ultra subscribers within the Gemini App and Google Flow.
General Public: Standard users can access Omni Flash features at no cost through integrations inside YouTube Shorts and the YouTube Create app.
Developers & Enterprise: API access for custom enterprise integrations is rolling out to corporate clients via Google Cloud.