Introducing Google Gen AI .NET SDK

Last year, we announced the Google Gen AI SDK as the new unified library for Gemini on Google AI (via the Gemini Developer API) and Vertex AI (via the Vertex AI API). At the time, it was only a Python SDK. Since then, the team has been busy adding support for Go, Node.js, and Java but my favorite language, C#, was missing until now. Today, I’m happy to announce that we now have a Google Gen AI . Read More ↗︎

Search Flights with Gemini Computer Use model

Earlier this month, the Gemini 2.5 Computer Use model was announced. This model is specialized in interacting with graphical user interfaces (UI). This is useful in scenarios where a structured API does not exist for the model to interact with (via function calling). Instead, you can use the Computer Use model to directly interact with user interfaces such as filling and submitting forms. It’s important to note that the model does not interact with the UI directly. Read More →

Secure your LLM apps with Google Cloud Model Armor

It’s crucial to secure inputs and outputs to and from your Large Language Model (LLM). Failure to do so can result in prompt injections, jailbreaking, sensitive information exposure, and more (as detailed in OWASP Top 10 for Large Language Model Applications). I previously talked about LLM Guard and Vertex AI and showed how to use LLM Guard to secure LLMs. Google Cloud has its own service to secure LLMs: Model Armor. Read More →

Gen AI Evaluation Service - Multimodal Metrics

This is the sixth and final post in my Vertex AI Gen AI Evaluation Service blog post series. In the previous posts, we covered computation-based, model-based, tool-use, and agent metrics. These metrics measure different aspects of an LLM response in different ways but one thing they all had in common: they are all for text-based outputs. LLMs nowadays also produce multimodal (images, videos) outputs. How do you evaluate multimodal outputs? That’s the topic of this blog post. Read More →

Gen AI Evaluation Service - Agent Metrics

In my previous Gen AI Evaluation Service - Tool-Use Metrics post, we talked about LLMs calling external tools and how you can use tool-use metrics to evaluate how good those tool calls are. In today’s fifth post of my Vertex AI Gen AI Evaluation Service blog post series, we will talk about a related topic: agents and agent metrics. What are agents? There are many definitions of agents but an agent is essentially a piece of software that acts autonomously to achieve specific goals. Read More →

Gen AI Evaluation Service - Tool-Use Metrics

I’m continuing my Vertex AI Gen AI Evaluation Service blog post series. In today’s fourth post of the series, I will talk about tool-use metrics. What is tool use? Tool use, also known as function calling, provides the LLM with definitions of external tools (for example, a get_current_weather function). When processing a prompt, the model determines if a tool is needed and, if so, outputs structured data specifying the tool to call and its parameters (for example, get_current_weather(location='London')). Read More →

Gen AI Evaluation Service - Model-Based Metrics

In the Gen AI Evaluation Service - An Overview post, I introduced Vertex AI’s Gen AI evaluation service and talked about the various classes of metrics it supports. In the Gen AI Evaluation Service - Computation-Based Metrics post, we delved into computation-based metrics, what they provide, and discussed their limitations. In today’s third post of the series, we’ll dive into model-based metrics. The idea of model-based metrics is to use a judge model to evaluate the output of a candidate model. Read More →

Gen AI Evaluation Service - Computation-Based Metrics

In my Gen AI Evaluation Service - An Overview post, I introduced Vertex AI’s Gen AI evaluation service and talked about the various classes of metrics it supports. In today’s post, I want to dive into computation-based metrics, what they provide, and discuss their limitations. Computation-based metrics are metrics that can be calculated using a mathematical formula. They’re deterministic – the same input produces the same score, unlike model-based metrics where you might get slightly different scores for the same input. Read More →

Gen AI Evaluation Service - An Overview

Generating content with Large Language Models (LLMs) is easy. Determining whether the generated content is good is hard. That’s why evaluating LLM outputs with metrics is crucial. Previously, I talked about DeepEval and Promptfoo as some of the tools you can use for LLM evaluation. I also talked about RAG triad metrics specifically for Retrieval Augmented Generation (RAG) evaluation for LLMs. In the next few posts, I want to talk about a Google Cloud specific evaluation service: the Gen AI evaluation service in Vertex AI. Read More →

Evaluating RAG pipelines with the RAG triad

Retrieval-Augmented Generation (RAG) emerged as a dominant framework for feeding Large Language Models (LLMs) the context beyond the scope of their training data and enabling LLMs to respond with more grounded answers and fewer hallucinations based on that context. However, designing an effective RAG pipeline can be challenging. You need to answer questions such as: How should you parse and chunk text documents for vector embedding? What chunk size and overlay size should you use? Read More ↗︎