Google Cloud Platform

Firestore for Text Embedding and Similarity Search

Posted on 9 October 2024

In my previous Persisting LLM chat history to Firestore post, I showed how to persist chat messages in Firestore for more meaningful and context-aware conversations. Another common requirement in LLM applications is to ground responses in data for more relevant answers. For that, you need embeddings. In this post, I want to talk specifically about text embeddings and how Firestore and LangChain can help you to store text embeddings and do similarity searches against them. Read More →

GenAI GoogleAI VertexAI Gemini Firestore Google Cloud Platform

Persisting LLM chat history to Firestore

Posted on 1 October 2024

Firestore has long been my go-to NoSQL backend for my serverless apps. Recently, it’s becoming my go-to backend for my LLM powered apps too. In this series of posts, I want to show you how Firestore can help for your LLM apps. In the first post of the series, I want to talk about LLM powered chat applications. I know, not all LLM apps have to be chat based apps but a lot of them are because LLMs are simply very good at chat based communication. Read More →

GenAI GoogleAI VertexAI Gemini Firestore Google Cloud Platform

Semantic Kernel and Gemini

Posted on 19 August 2024

Introduction When you’re building a Large Language Model (LLMs) application, you typically start with the SDK of the LLM you’re trying to talk to. However, at some point, it might make sense to start using a higher level framework. This is especially true if you rely on multiple LLMs from different vendors. Instead of learning and using SDKs from multiple vendors, you can learn a higher level framework and use that to orchestrate your calls to multiple LLMs. Read More →

GenAI GoogleAI VertexAI Gemini Google Cloud Platform

DeepEval and Vertex AI

Posted on 12 August 2024

Introduction When you’re working with Large Language Models (LLMs), it’s crucial to have an evaluation framework in place. Only by constantly evaluating and testing your LLM outputs, you can tell if the changes you’re making to prompts or the output you’re getting back from the LLM are actually good. In this blog post, we’ll look into one of those evaluation frameworks called DeepEval, an open-source evaluation framework for LLMs. It allows to “unit test” LLM outputs in a similar way to Pytest. Read More →

GenAI VertexAI Gemini Google Cloud Platform

Deep dive into function calling in Gemini

Posted on 6 August 2024

Introduction In this blog post, we’ll deep dive into function calling in Gemini. More specifically, you’ll see how to handle multiple and parallel function call requests from generate_content and chat interfaces and take a look at the new auto function calling feature through a sample weather application. What is function calling? Function Calling is useful to augment LLMs with more up-to-date data via external API calls. You can define custom functions and provide these to an LLM. Read More →

GenAI VertexAI Gemini Google Cloud Platform

Control LLM costs with context caching

Posted on 19 July 2024

Introduction Some large language models (LLMs), such as Gemini 1.5 Flash or Gemini 1.5 Pro, have a very large context window. This is very useful if you want to analyze a big chunk of data, such as a whole book or a long video. On the other hand, it can get quite expensive if you keep sending the same large data in your prompts. Context caching can help. Context caching is useful in reducing costs when a substantial context is referenced repeatedly by shorter requests such as: Read More →

GenAI VertexAI Gemini Google Cloud Platform

Control LLM output with response type and schema

Posted on 15 July 2024

Introduction Large language models (LLMs) are great at generating content but the output format you get back can be a hit or miss sometimes. For example, you ask for a JSON output in certain format and you might get free-form text or a JSON wrapped in markdown string or a proper JSON but with some required fields missing. If your application requires a strict format, this can be a real problem. Read More →

GenAI VertexAI Gemini Google Cloud Platform

RAG API powered by LlamaIndex on Vertex AI

Posted on 8 July 2024

Introduction Recently, I talked about why grounding LLMs is important and how to ground LLMs with public data using Google Search (Vertex AI’s Grounding with Google Search: how to use it and why) and with private data using Vertex AI Search (Grounding LLMs with your own data using Vertex AI Search). In today’s post, I want to talk about another more flexible and customizable way of grounding your LLMs with private data: the RAG API powered by LlamaIndex on Vertex AI. Read More →

GenAI VertexAI Gemini Google Cloud Platform

Grounding LLMs with your own data using Vertex AI Search

Posted on 1 July 2024

Introduction In my previous Vertex AI’s Grounding with Google Search: how to use it and why post, I explained why you need grounding with large language models (LLMs) and how Vertex AI’s grounding with Google Search can help to ground LLMs with public up-to-date data. That’s great but you sometimes need to ground LLMs with your own private data. How can you do that? There are many ways but Vertex AI Search is the easiest way and that’s what I want to talk about today with a simple use case. Read More →

GenAI VertexAI Gemini Google Cloud Platform

Give your LLM a quick lie detector test

Posted on 6 June 2024

Introduction It’s no secret that LLMs sometimes lie and they do so in a very confident kind of way. This might be OK for some applications but it can be a real problem if your application requires high levels of accuracy. I remember when the first LLMs emerged back in early 2023. I tried some of the early models and it felt like they were hallucinating half of the time. More recently, it started feeling like LLMs are getting better at giving more factual answers. Read More →

GenAI VertexAI Gemini Google Cloud Platform