Give your LLM a quick lie detector test

Introduction It’s no secret that LLMs sometimes lie and they do so in a very confident kind of way. This might be OK for some applications but it can be a real problem if your application requires high levels of accuracy. I remember when the first LLMs emerged back in early 2023. I tried some of the early models and it felt like they were hallucinating half of the time. More recently, it started feeling like LLMs are getting better at giving more factual answers. Read More →

The Consistency vs. Novelty Dilemma

It’s been a while since I wrote a non-work related topic. Last time, I wrote about the unique kindness I experienced in Japan (see The Butterfly effect of kindness). This time, I want to write about a dilemma that I’ve been thinking about for a while. When I reflect on my life so far, whenever I had some progress (learning a new skill, making new lasting connections, changing to a new job, losing weight), it was always due to consistency in my life. Read More →

Vertex AI's Grounding with Google Search - how to use it and why

Introduction Once in a while, you come across a feature that is so easy to use and so useful that you don’t know how you lived without it before. For me, Vertex AI’s Grounding with Google Search is one of those features. In this blog post, I explain why you need grounding with large language models (LLMs) and how Vertex AI’s Grounding with Google Search can help with minimal effort on your part. Read More ↗︎

AsyncAPI gets a new version 3.0 and new operations

Almost one year ago, I talked about AsyncAPI 2.6 and how confusing its publish and subscribe operations can be in my Understanding AsyncAPI’s publish & subscribe semantics with an example post. Since then, a new 3.0 version of AsyncAPI has been released with breaking changes and a totally new send and receive operations. In this blog post, I want to revisit the example from last year and show how to rewrite it for AsyncAPI 3. Read More →

A tour of Gemini 1.5 Pro samples

Introduction Back in February, Google announced Gemini 1.5 Pro with its impressive 1 million token context window. Larger context size means that Gemini 1.5 Pro can process vast amounts of information in one go — 1 hour of video, 11 hours of audio, 30,000 lines of code or over 700,000 words and the good news is that there’s good language support. In this blog post, I will point out some samples utilizing Gemini 1. Read More →

Making API calls exactly once when using Workflows

One challenge with any distributed system, including Workflows, is ensuring that requests sent from one service to another are processed exactly once, when needed; for example, when placing a customer order in a shipping queue, withdrawing funds from a bank account, or processing a payment. In this blog post, we’ll provide an example of a website invoking Workflows, and Workflows in turn invoking a Cloud Function. We’ll show how to make sure both Workflows and the Cloud Function logic only runs once. Read More ↗︎

C# and Vertex AI Gemini streaming API bug and workaround

A user recently reported an intermittent error with C# and Gemini 1.5 model on Vertex AI’s streaming API. In this blog post, I want to outline what the error is, what causes it, and how to avoid it with the hopes of saving some frustration for someone out there. Error The user reported using Google.Cloud.AIPlatform.V1 library with version 2.27.0 to use Gemini 1.5 via Vertex AI’s streaming API and running into an intermittent System. Read More →

A Tour of Gemini Code Assist - Slides and Demos

This week, I’m speaking at 3 meetups on Gemini Code Assist. My talk has a little introduction to GenAI and Gemini, followed by a series of hands-on demos that showcase different features of Gemini Code Assist. In the demos, I setup Gemini Code Assist in Cloud Code IDE plugin in Visual Studio Code. Then, I show how to design and create an application, explain, run, generate, test, transform code, and finish with understanding logs with the help of Gemini. Read More →

Vertex AI Gemini generateContent (non-streaming) API

Introduction In my recent blog post, I’ve been exploring Vertex AI’s Gemini REST API and mainly talked about the streamGenerateContent method which is a streaming API. Recently, a new method appeared in Vertex AI docs: generateContent which is the non-streaming (unary) version of the API. In this short blog post, I take a closer look at the new non-streaming generateContent API and explain why it makes sense to use as a simpler API when the latency is not super critical. Read More →

Orchestrate Vertex AI’s PaLM and Gemini APIs with Workflows

Everyone is excited about generative AI (gen AI) nowadays and rightfully so. You might be generating text with PaLM 2 or Gemini Pro, generating images with ImageGen 2, translating code from language to another with Codey, or describing images and videos with Gemini Pro Vision. No matter how you’re using gen AI, at the end of the day, you’re calling an endpoint either with an SDK or a library or via a REST API. Read More ↗︎