Evaluating RAG pipelines

Retrieval-Augmented Generation (RAG) emerged as a dominant framework to feed LLMs the context beyond the scope of its training data and enable LLMs to respond with more grounded answers with less hallucinations based on that context.

However, designing an effective RAG pipeline can be challenging. You need to answer certain questions such as:

  1. How should you parse and chunk text documents for embedding? What chunk and overlay size to use?
  2. What embedding model is best for your use case?
  3. What retrieval method works most effectively? How many documents should you retrieve by default? Does the retriever actually manage to retrieve the relevant documents?
  4. Does the generator actually generate content in line with the relevant context? What parameters (e.g. model, prompt template, temperature) work better?

The only way to objectively answer these questions is to measure how well the RAG pipeline works but what exactly do you measure? This is the topic of this blog post.

Read More →

Gemini on Vertex AI and Google AI now unified with the new Google Gen AI SDK

If you’ve been working with Gemini, you’ve likely encountered the two separate client libraries for Gemini: the Gemini library for Google AI vs. Vertex AI in Google Cloud. Even though the two libraries are quite similar, there are slight differences that make the two libraries non-interchangeable.

I usually start my experiments in Google AI and when it is time to switch to Vertex AI on Google Cloud, I couldn’t simply copy and paste my code. I had to go through updating my Google AI libraries to Vertex AI libraries. It wasn’t difficult but it was quite annoying.

Read More →

Control LLM output with LangChain's structured and Pydantic output parsers

In my previous Control LLM output with response type and schema post, I talked about how you can define a JSON response schema and Vertex AI makes sure the output of the Large Language Model (LLM) conforms to that schema.

In this post, I show how you can implement a similar response schema using LangChain’s structured output parser with any model. You can further get the output parsed and populated into Python classes automatically with the Pydantic output parser. This helps you to really narrow down and structure LLM outputs.

Read More →

Tracing with Langtrace and Gemini

Large Language Models (LLMs) feel like a totally new technology with totally new problems. It’s true to some extent but at the same time, they also have the same old problems that we had to tackle in traditional technology.

For example, how do you figure out which LLM calls are taking too long or have failed? At the bare minimum, you need logging but ideally, you use a full observability platform like OpenTelemetry with logging, tracing, metrics and more. You need the good old software engineering practices, such as observability, applied to new technologies like LLMs.

Read More →

Batch prediction in Gemini

LLMs are great in generating content on demand but if left unchecked, you can be left with a large bill at the end of the day. In my Control LLM costs with context caching post, I talked about how to limit costs by using context caching. Batch generation is another technique you can use to save time at a discounted price.

What’s batch generation?

Batch generation in Gemini allows you to send multiple generative AI requests in batches rather than one by one and get responses asynchronously either in a Cloud Storage bucket or a BigQuery table. This not only simplifies processing of large datasets, but it also saves time and money, as batch requests are processed in paralllel and discounted 50% from standard requests.

Read More →

LLM Guard and Vertex AI

LLM Guard and Vertex AI

I’ve been focusing on evaluation frameworks lately because I believe that the hardest problem while using LLMs is to make sure they behave properly. Are you getting the right outputs grounded with your data? Are outputs free of harmful or PII data? When you make a change to your RAG pipeline or to your prompts, are outputs getting better or worse? How do you know? You don’t know unless you measure. What do you measure and how? These are the sort of questions you need to answer and that’s when evaluation frameworks come into the picture.

Read More →

Promptfoo and Vertex AI

Promptfoo and Vertex AI

In my previous DeepEval and Vertex AI blog post, I talked about how crucial it is to have an evaluation framework in place when working with Large Language Models (LLMs) and introduced DeepEval as one of such evaluation frameworks.

Recently, I came across another LLM evaluation and security framework called Promptfoo. In this post, I will introduce Promptfoo, show what it provides for evaluations and security testing, and how it can be used with Vertex AI.

Read More →

Firestore for Image Embeddings

Firestore and LangChain

In my previous Firestore for Text Embedding and Similarity Search post, I talked about how Firestore and LangChain can help you to store text embeddings and do similarity searches against them. With multimodal embedding models, you can generate embeddings not only for text but for images and video as well. In this post, I will show you how to store image embeddings in Firestore and later use them for similarity search.

Read More →

Firestore for Text Embedding and Similarity Search

Firestore and LangChain

In my previous Persisting LLM chat history to Firestore post, I showed how to persist chat messages in Firestore for more meaningful and context-aware conversations. Another common requirement in LLM applications is to ground responses in data for more relevant answers. For that, you need embeddings. In this post, I want to talk specifically about text embeddings and how Firestore and LangChain can help you to store text embeddings and do similarity searches against them.

Read More →

Persisting LLM chat history to Firestore

Firestore and LangChain

Firestore has long been my go-to NoSQL backend for my serverless apps. Recently, it’s becoming my go-to backend for my LLM powered apps too. In this series of posts, I want to show you how Firestore can help for your LLM apps.

In the first post of the series, I want to talk about LLM powered chat applications. I know, not all LLM apps have to be chat based apps but a lot of them are because LLMs are simply very good at chat based communication.

Read More →