Google Cloud Platform

Control LLM costs with context caching

Posted on July 19, 2024

Introduction

Some large language models (LLMs), such as Gemini 1.5 Flash or Gemini 1.5 Pro, have a very large context window. This is very useful if you want to analyze a big chunk of data, such as a whole book or a long video. On the other hand, it can get quite expensive if you keep sending the same large data in your prompts. Context caching can help.

Control LLM output with response type and schema

Posted on July 15, 2024

Introduction

Large language models (LLMs) are great at generating content but the output format you get back can be a hit or miss sometimes.

For example, you ask for a JSON output in certain format and you might get free-form text or a JSON wrapped in markdown string or a proper JSON but with some required fields missing. If your application requires a strict format, this can be a real problem.

GenAI VertexAI Gemini Google Cloud Platform

RAG API powered by LlamaIndex on Vertex AI

Posted on July 8, 2024

Introduction

Recently, I talked about why grounding LLMs is important and how to ground LLMs with public data using Google Search (Vertex AI’s Grounding with Google Search: how to use it and why) and with private data using Vertex AI Search (Grounding LLMs with your own data using Vertex AI Search).

In today’s post, I want to talk about another more flexible and customizable way of grounding your LLMs with private data: the RAG API powered by LlamaIndex on Vertex AI.

GenAI VertexAI Gemini Google Cloud Platform

Grounding LLMs with your own data using Vertex AI Search

Posted on July 1, 2024 (Last modified on June 25, 2024)

Introduction

In my previous Vertex AI’s Grounding with Google Search: how to use it and why post, I explained why you need grounding with large language models (LLMs) and how Vertex AI’s grounding with Google Search can help to ground LLMs with public up-to-date data.

That’s great but you sometimes need to ground LLMs with your own private data. How can you do that? There are many ways but Vertex AI Search is the easiest way and that’s what I want to talk about today with a simple use case.

GenAI VertexAI Gemini Google Cloud Platform

Give your LLM a quick lie detector test

Posted on June 6, 2024

Introduction

It’s no secret that LLMs sometimes lie and they do so in a very confident kind of way. This might be OK for some applications but it can be a real problem if your application requires high levels of accuracy.

I remember when the first LLMs emerged back in early 2023. I tried some of the early models and it felt like they were hallucinating half of the time. More recently, it started feeling like LLMs are getting better at giving more factual answers. But it’s just a feeling and you can’t base application decisions (or any decision?) on feelings, can you?

GenAI VertexAI Gemini Google Cloud Platform

Vertex AI's Grounding with Google Search - how to use it and why

Posted on May 29, 2024 (Last modified on May 30, 2024)

Introduction

Once in a while, you come across a feature that is so easy to use and so useful that you don’t know how you lived without it before. For me, Vertex AI’s Grounding with Google Search is one of those features.

In this blog post, I explain why you need grounding with large language models (LLMs) and how Vertex AI’s Grounding with Google Search can help with minimal effort on your part.

GenAI VertexAI Gemini Google Cloud Platform

A tour of Gemini 1.5 Pro samples

Posted on May 7, 2024

Introduction

Back in February, Google announced Gemini 1.5 Pro with its impressive 1 million token context window.

Larger context size means that Gemini 1.5 Pro can process vast amounts of information in one go — 1 hour of video, 11 hours of audio, 30,000 lines of code or over 700,000 words and the good news is that there’s good language support.

In this blog post, I will point out some samples utilizing Gemini 1.5 Pro in Google Cloud’s Vertex AI in different use cases and languages (Python, Node.js, Java, C#, Go).

GenAI VertexAI Gemini Google Cloud Platform

Making API calls exactly once when using Workflows

Posted on May 3, 2024

One challenge with any distributed system, including Workflows, is ensuring that requests sent from one service to another are processed exactly once, when needed; for example, when placing a customer order in a shipping queue, withdrawing funds from a bank account, or processing a payment.

In this blog post, we’ll provide an example of a website invoking Workflows, and Workflows in turn invoking a Cloud Function. We’ll show how to make sure both Workflows and the Cloud Function logic only runs once. We’ll also talk about how to invoke Workflows exactly once when using HTTP callbacks, Pub/Sub messages, or Cloud Tasks.

Workflows Serverless Google Cloud Platform How To

C# and Vertex AI Gemini streaming API bug and workaround

Posted on May 1, 2024

A user recently reported an intermittent error with C# and Gemini 1.5 model on Vertex AI’s streaming API. In this blog post, I want to outline what the error is, what causes it, and how to avoid it with the hopes of saving some frustration for someone out there.

Error

The user reported using Google.Cloud.AIPlatform.V1 library with version 2.27.0 to use Gemini 1.5 via Vertex AI’s streaming API and running into an intermittent System.IO.IOException.

GenAI VertexAI Gemini DotNet Google Cloud Platform

A Tour of Gemini Code Assist - Slides and Demos

Posted on April 24, 2024

This week, I’m speaking at 3 meetups on Gemini Code Assist. My talk has a little introduction to GenAI and Gemini, followed by a series of hands-on demos that showcase different features of Gemini Code Assist.

In the demos, I setup Gemini Code Assist in Cloud Code IDE plugin in Visual Studio Code. Then, I show how to design and create an application, explain, run, generate, test, transform code, and finish with understanding logs with the help of Gemini.

GenAI VertexAI Gemini Google Cloud Platform