Google launches AI agent project Mariner that can understand and reason about information on the browser screen to help complete task processing
Last night, Google announced the launch of Google Gemini 2.0, the latest model from Google's AI team, featuring multimodal support capable of understanding content from images to videos.
Utilizing this model, Google has developed an AI agent project named Project Mariner, an early research prototype built on Gemini 2.0. The project aims to explore the future of human-computer interaction starting from the browser.
This AI agent can comprehend and reason with information on the browser screen, incorporating every pixel, text, codes, images, and web elements like forms, then employs this data to complete tasks via an experimental Chrome extension.
Take, for instance, a webpage loaded with extensive data that needs to be copied and organized into a spreadsheet. Here, the AI agent comes into play.
Once instructions are provided to Project Mariner, the AI agent automatically interacts with the browser, organizing and inputting data into designated web page areas as per your requirements.
As an early prototype, Project Mariner currently performs typing, scrolling, and clicking actions within the active browser tab (the page you have open) and prompts user confirmation for sensitive actions, such as shopping and payments.
Evaluated against the WebVoyager benchmark, Project Mariner has achieved an optimal performance outcome of 83.5% as a standalone agent, a benchmark specifically designed to assess AI agent performance on real-world web tasks end-to-end.
Google has initially made Project Mariner available to a select group of trusted developers for testing, with plans to gradually expand the test population, allowing more developers and the general public to engage with Project Mariner for human-computer interaction.