Batch prediction in Gemini

LLMs are great in generating content on demand but if left unchecked, you can be left with a large bill at the end of the day. In my Control LLM costs with context caching post, I talked about how to limit costs by using context caching. Batch generation is another technique you can use to save time at a discounted price. What’s batch generation? Batch generation in Gemini allows you to send multiple generative AI requests in batches rather than one by one and get responses asynchronously either in a Cloud Storage bucket or a BigQuery table. Read More →

LLM Guard and Vertex AI

I’ve been focusing on evaluation frameworks lately because I believe that the hardest problem while using LLMs is to make sure they behave properly. Are you getting the right outputs grounded with your data? Are outputs free of harmful or PII data? When you make a change to your RAG pipeline or to your prompts, are outputs getting better or worse? How do you know? You don’t know unless you measure. Read More →

Promptfoo and Vertex AI

In my previous DeepEval and Vertex AI blog post, I talked about how crucial it is to have an evaluation framework in place when working with Large Language Models (LLMs) and introduced DeepEval as one of such evaluation frameworks. Recently, I came across another LLM evaluation and security framework called Promptfoo. In this post, I will introduce Promptfoo, show what it provides for evaluations and security testing, and how it can be used with Vertex AI. Read More →

Firestore for Image Embeddings

In my previous Firestore for Text Embedding and Similarity Search post, I talked about how Firestore and LangChain can help you to store text embeddings and do similarity searches against them. With multimodal embedding models, you can generate embeddings not only for text but for images and video as well. In this post, I will show you how to store image embeddings in Firestore and later use them for similarity search. Read More →

Firestore for Text Embedding and Similarity Search

In my previous Persisting LLM chat history to Firestore post, I showed how to persist chat messages in Firestore for more meaningful and context-aware conversations. Another common requirement in LLM applications is to ground responses in data for more relevant answers. For that, you need embeddings. In this post, I want to talk specifically about text embeddings and how Firestore and LangChain can help you to store text embeddings and do similarity searches against them. Read More →

Persisting LLM chat history to Firestore

Firestore has long been my go-to NoSQL backend for my serverless apps. Recently, it’s becoming my go-to backend for my LLM powered apps too. In this series of posts, I want to show you how Firestore can help for your LLM apps. In the first post of the series, I want to talk about LLM powered chat applications. I know, not all LLM apps have to be chat based apps but a lot of them are because LLMs are simply very good at chat based communication. Read More →

Semantic Kernel and Gemini

Introduction When you’re building a Large Language Model (LLMs) application, you typically start with the SDK of the LLM you’re trying to talk to. However, at some point, it might make sense to start using a higher level framework. This is especially true if you rely on multiple LLMs from different vendors. Instead of learning and using SDKs from multiple vendors, you can learn a higher level framework and use that to orchestrate your calls to multiple LLMs. Read More →