Gemini on Vertex AI and Google AI now unified with the new Google Gen AI SDK

If you’ve been working with Gemini, you’ve likely encountered the two separate client libraries for Gemini: the Gemini library for Google AI vs. Vertex AI in Google Cloud. Even though the two libraries are quite similar, there are slight differences that make the two libraries non-interchangeable. I usually start my experiments in Google AI and when it is time to switch to Vertex AI on Google Cloud, I couldn’t simply copy and paste my code. Read More →

Control LLM output with LangChain's structured and Pydantic output parsers

In my previous Control LLM output with response type and schema post, I talked about how you can define a JSON response schema and Vertex AI makes sure the output of the Large Language Model (LLM) conforms to that schema. In this post, I show how you can implement a similar response schema using LangChain’s structured output parser with any model. You can further get the output parsed and populated into Python classes automatically with the Pydantic output parser. Read More →

Tracing with Langtrace and Gemini

Large Language Models (LLMs) feel like a totally new technology with totally new problems. It’s true to some extent but at the same time, they also have the same old problems that we had to tackle in traditional technology. For example, how do you figure out which LLM calls are taking too long or have failed? At the bare minimum, you need logging but ideally, you use a full observability platform like OpenTelemetry with logging, tracing, metrics and more. Read More →

Batch prediction in Gemini

LLMs are great in generating content on demand but if left unchecked, you can be left with a large bill at the end of the day. In my Control LLM costs with context caching post, I talked about how to limit costs by using context caching. Batch generation is another technique you can use to save time at a discounted price. What’s batch generation? Batch generation in Gemini allows you to send multiple generative AI requests in batches rather than one by one and get responses asynchronously either in a Cloud Storage bucket or a BigQuery table. Read More →

LLM Guard and Vertex AI

I’ve been focusing on evaluation frameworks lately because I believe that the hardest problem while using LLMs is to make sure they behave properly. Are you getting the right outputs grounded with your data? Are outputs free of harmful or PII data? When you make a change to your RAG pipeline or to your prompts, are outputs getting better or worse? How do you know? You don’t know unless you measure. Read More →

Promptfoo and Vertex AI

In my previous DeepEval and Vertex AI blog post, I talked about how crucial it is to have an evaluation framework in place when working with Large Language Models (LLMs) and introduced DeepEval as one of such evaluation frameworks. Recently, I came across another LLM evaluation and security framework called Promptfoo. In this post, I will introduce Promptfoo, show what it provides for evaluations and security testing, and how it can be used with Vertex AI. Read More →

Firestore for Image Embeddings

In my previous Firestore for Text Embedding and Similarity Search post, I talked about how Firestore and LangChain can help you to store text embeddings and do similarity searches against them. With multimodal embedding models, you can generate embeddings not only for text but for images and video as well. In this post, I will show you how to store image embeddings in Firestore and later use them for similarity search. Read More →

Firestore for Text Embedding and Similarity Search

In my previous Persisting LLM chat history to Firestore post, I showed how to persist chat messages in Firestore for more meaningful and context-aware conversations. Another common requirement in LLM applications is to ground responses in data for more relevant answers. For that, you need embeddings. In this post, I want to talk specifically about text embeddings and how Firestore and LangChain can help you to store text embeddings and do similarity searches against them. Read More →

Persisting LLM chat history to Firestore

Firestore has long been my go-to NoSQL backend for my serverless apps. Recently, it’s becoming my go-to backend for my LLM powered apps too. In this series of posts, I want to show you how Firestore can help for your LLM apps. In the first post of the series, I want to talk about LLM powered chat applications. I know, not all LLM apps have to be chat based apps but a lot of them are because LLMs are simply very good at chat based communication. Read More →

Semantic Kernel and Gemini

Introduction When you’re building a Large Language Model (LLMs) application, you typically start with the SDK of the LLM you’re trying to talk to. However, at some point, it might make sense to start using a higher level framework. This is especially true if you rely on multiple LLMs from different vendors. Instead of learning and using SDKs from multiple vendors, you can learn a higher level framework and use that to orchestrate your calls to multiple LLMs. Read More →