Firestore for Text Embedding and Similarity Search

In my previous Persisting LLM chat history to Firestore post, I showed how to persist chat messages in Firestore for more meaningful and context-aware conversations. Another common requirement in LLM applications is to ground responses in data for more relevant answers. For that, you need embeddings. In this post, I want to talk specifically about text embeddings and how Firestore and LangChain can help you to store text embeddings and do similarity searches against them. Read More →

Persisting LLM chat history to Firestore

Firestore has long been my go-to NoSQL backend for my serverless apps. Recently, it’s becoming my go-to backend for my LLM powered apps too. In this series of posts, I want to show you how Firestore can help for your LLM apps. In the first post of the series, I want to talk about LLM powered chat applications. I know, not all LLM apps have to be chat based apps but a lot of them are because LLMs are simply very good at chat based communication. Read More →

Semantic Kernel and Gemini

Introduction When you’re building a Large Language Model (LLMs) application, you typically start with the SDK of the LLM you’re trying to talk to. However, at some point, it might make sense to start using a higher level framework. This is especially true if you rely on multiple LLMs from different vendors. Instead of learning and using SDKs from multiple vendors, you can learn a higher level framework and use that to orchestrate your calls to multiple LLMs. Read More →

DeepEval and Vertex AI

Introduction When you’re working with Large Language Models (LLMs), it’s crucial to have an evaluation framework in place. Only by constantly evaluating and testing your LLM outputs, you can tell if the changes you’re making to prompts or the output you’re getting back from the LLM are actually good. In this blog post, we’ll look into one of those evaluation frameworks called DeepEval, an open-source evaluation framework for LLMs. It allows to “unit test” LLM outputs in a similar way to Pytest. Read More →

Deep dive into function calling in Gemini

Introduction In this blog post, we’ll deep dive into function calling in Gemini. More specifically, you’ll see how to handle multiple and parallel function call requests from generate_content and chat interfaces and take a look at the new auto function calling feature through a sample weather application. What is function calling? Function Calling is useful to augment LLMs with more up-to-date data via external API calls. You can define custom functions and provide these to an LLM. Read More →

Control LLM costs with context caching

Introduction Some large language models (LLMs), such as Gemini 1.5 Flash or Gemini 1.5 Pro, have a very large context window. This is very useful if you want to analyze a big chunk of data, such as a whole book or a long video. On the other hand, it can get quite expensive if you keep sending the same large data in your prompts. Context caching can help. Context caching is useful in reducing costs when a substantial context is referenced repeatedly by shorter requests such as: Read More →

Control LLM output with response type and schema

Introduction Large language models (LLMs) are great at generating content but the output format you get back can be a hit or miss sometimes. For example, you ask for a JSON output in certain format and you might get free-form text or a JSON wrapped in markdown string or a proper JSON but with some required fields missing. If your application requires a strict format, this can be a real problem. Read More →

RAG API powered by LlamaIndex on Vertex AI

Introduction Recently, I talked about why grounding LLMs is important and how to ground LLMs with public data using Google Search (Vertex AI’s Grounding with Google Search: how to use it and why) and with private data using Vertex AI Search (Grounding LLMs with your own data using Vertex AI Search). In today’s post, I want to talk about another more flexible and customizable way of grounding your LLMs with private data: the RAG API powered by LlamaIndex on Vertex AI. Read More →

Grounding LLMs with your own data using Vertex AI Search

Introduction In my previous Vertex AI’s Grounding with Google Search: how to use it and why post, I explained why you need grounding with large language models (LLMs) and how Vertex AI’s grounding with Google Search can help to ground LLMs with public up-to-date data. That’s great but you sometimes need to ground LLMs with your own private data. How can you do that? There are many ways but Vertex AI Search is the easiest way and that’s what I want to talk about today with a simple use case. Read More →

Give your LLM a quick lie detector test

Introduction It’s no secret that LLMs sometimes lie and they do so in a very confident kind of way. This might be OK for some applications but it can be a real problem if your application requires high levels of accuracy. I remember when the first LLMs emerged back in early 2023. I tried some of the early models and it felt like they were hallucinating half of the time. More recently, it started feeling like LLMs are getting better at giving more factual answers. Read More →