Gemini 2.0: Google’s Most Advanced AI Model Yet
![]() |
Gemini 2.0 |
Embracing the Agentic Era of AI
A Message from Sundar Pichai, CEO of Google and Alphabet
Information is the foundation of human progress. Google has spent over 26 years organizing the world's information and making it accessible and useful. AI is the next frontier in organizing this information across every input and making it accessible via any output to make it truly useful.
Gemini 1.0 and 1.5 were groundbreaking in their multimodality and long context, enabling understanding of information across text, video, images, audio, and code. Millions of developers are currently building with Gemini, and it is helping to reimagine all of Google’s products.
Over the past year, Google has been investing in more agentic models, meaning they can understand the world, think multiple steps ahead, and take action with supervision.
Introducing Gemini 2.0
Gemini 2.0 is Google's most capable model yet, designed for the agentic era. With advances in multimodality, like native image and audio output and native tool use, it enables new AI agents that bring us closer to the vision of a universal assistant.
Gemini 2.0 Flash experimental model will be available to all Gemini users. A new feature called Deep Research, which uses advanced reasoning and long context capabilities to act as a research assistant, is available in Gemini Advanced.
Gemini 2.0 and Search
AI has transformed Google Search. AI Overviews now reach 1 billion people, enabling them to ask new types of questions. The advanced reasoning capabilities of Gemini 2.0 are coming to AI Overviews to handle more complex topics and multi-step questions, including advanced math equations, multimodal queries, and coding.
The Technology Behind Gemini 2.0
Gemini 2.0 was built on custom hardware like Trillium, Google's sixth-generation TPUs. TPUs powered 100% of Gemini 2.0 training and inference, and Trillium is now generally available to customers.
Gemini 2.0: From Information Organization to Enhanced Usefulness
Gemini 1.0 focused on organizing and understanding information, while Gemini 2.0 aims to make it significantly more useful.
Gemini 2.0 Flash
Gemini 2.0 Flash, the first in the Gemini 2.0 family of models, is an experimental version and Google’s workhorse model. It offers low latency and enhanced performance at scale.
Features and Enhancements
Enhanced performance compared to 1.5 Flash, even outperforming 1.5 Pro on key benchmarks at twice the speed.
- Supports multimodal inputs like images, video, and audio.
- Supports multimodal output like natively generated images mixed with text and steerable text-to-speech (TTS) multilingual audio.
- Can natively call tools like Google Search, code execution, as well as third-party user-defined functions.
Availability
- Available as an experimental model to developers via the Gemini API in Google AI Studio and Vertex AI.
- Multimodal input and text output are available to all developers.
- Text-to-speech and native image generation are available to early-access partners.
- General availability will begin in January, along with more model sizes.
Gemini 2.0 in the Gemini App
Gemini users can access a chat-optimized version of 2.0 Flash experimentally by selecting it in the model drop-down on desktop and mobile web. It will be available in the Gemini mobile app soon.
Unlocking Agentic Experiences with Gemini 2.0
Gemini 2.0 Flash’s features, including its native user interface action capabilities, multimodal reasoning, long context understanding, complex instruction following and planning, compositional function-calling, native tool use, and improved latency, enable a new class of agentic experiences.
Project Astra: Agents Using Multimodal Understanding in the Real World
Project Astra is a research prototype exploring the future capabilities of a universal AI assistant, currently being tested on Android phones. Improvements in the latest version built with Gemini 2.0 include:
Improvements in the Latest Version
- Better dialogue: Ability to converse in multiple languages and in mixed languages, with a better understanding of accents and uncommon words.
- New tool use: With Gemini 2.0 Project Astra can use Google Search, Lens, and Maps.
- Better memory: Improved ability to remember things with up to 10 minutes of in-session memory and ability to remember more past conversations.
- Improved latency: Can understand language at about the latency of human conversation due to new streaming capabilities and native audio understanding.
Google is working to bring these capabilities to products like the Gemini app and other form factors like glasses. A small group will soon begin testing Project Astra on prototype glasses.
Project Mariner: Agents That Can Help You Accomplish Complex Tasks
Project Mariner is an early research prototype built with Gemini 2.0 exploring the future of human-agent interaction within a web browser. It can understand and reason across information on a browser screen, including pixels and web elements like text, code, images, and forms, and then uses that information via an experimental Chrome extension to complete tasks.
Performance
Project Mariner achieved a state-of-the-art result of 83.5% on the WebVoyager benchmark, which tests agent performance on end-to-end real-world web tasks.
Safety and Responsibility
Project Mariner can only type, scroll, or click in the active browser tab and asks users for final confirmation before sensitive actions, like purchasing something.
Availability
Trusted testers are testing Project Mariner using an experimental Chrome extension, and conversations with the web ecosystem are underway.
Jules: Agents for Developers
Jules is an experimental AI-powered code agent built with Gemini 2.0 that integrates into a GitHub workflow. It can tackle an issue, develop a plan, and execute it under a developer's direction and supervision. This is part of Google’s goal of building AI agents that are helpful in all domains, including coding.
Agents in Games and Other Domains
Google DeepMind has used games to train AI models in following rules, planning, and logic. Genie 2, an AI model that can create endless playable 3D worlds from a single image, is a recent example of this.
Agents in Video Games
Google has built agents using Gemini 2.0 that can help users navigate the virtual world of video games. These agents can reason about the game based on screen action, offer suggestions in real-time conversation, and even use Google Search to connect users with gaming knowledge on the web.
Google is collaborating with game developers like Supercell to explore how these agents work across various games.
Agents in the Physical World
Google is experimenting with agents that can help in the physical world by applying Gemini 2.0's spatial reasoning capabilities to robotics.
Building Responsibly in the Agentic Era
Google acknowledges the responsibility that comes with developing new AI technologies and the safety and security questions AI agents raise. Therefore, they are taking an exploratory and gradual approach to development.
Safety Measures
- Working with the Responsibility and Safety Committee (RSC) to identify and understand potential risks.
- Using Gemini 2.0's reasoning capabilities for AI-assisted red teaming, automatically generating evaluations and training data to mitigate risks.
- Evaluating and training the model across image and audio input and output to improve safety.
- Exploring mitigations against unintentional sharing of sensitive information with the agent in Project Astra, building in privacy controls for deleting sessions, and researching ways to ensure AI agents are reliable sources of information.
- Ensuring Project Mariner prioritizes user instructions over third-party prompt injection attempts, preventing malicious instructions and protecting users from fraud and phishing.
Gemini 2.0, AI Agents, and Beyond
The release of Gemini 2.0 Flash and research prototypes exploring agentic possibilities mark a new chapter in the Gemini era. Google is committed to responsibly exploring new possibilities as it builds towards AGI.
No comments:
Post a Comment