Earlier this month, the Gemini 2.5 Computer Use model was announced. This model is specialized in interacting with graphical user interfaces (UI). This is useful in scenarios where a structured API does not exist for the model to interact with (via function calling). Instead, you can use the Computer Use model to directly interact with user interfaces such as filling and submitting forms.
It’s important to note that the model does not interact with the UI directly. As input, the model receives the user request, a screenshot of the environment, and a history of recent actions. As output, it generates a function call representing a UI action such as clicking or typing (see the full list of supported UI actions). It’s the client-side code’s responsibility to execute the received action and the process continues in a loop:
Read More →




