Some large language models (LLMs), such as Gemini 1.5 Flash or Gemini 1.5 Pro, have a very large context
window. This is very useful if you want to analyze a big chunk of data, such as
a whole book or a long video. On the other hand, it can get quite expensive if
you keep sending the same large data in your prompts. Context caching can help.
Large language models (LLMs) are great at generating content but the output
format you get back can be a hit or miss sometimes.
For example, you ask for a JSON output in certain format and you might get
free-form text or a JSON wrapped in markdown string or a proper JSON but with
some required fields missing. If your application requires a strict format, this
can be a real problem.
In today’s post, I want to talk about another more flexible and customizable way
of grounding your LLMs with private data: the RAG API powered by LlamaIndex on
Vertex AI.
Posted on July 1, 2024
(Last modified on June 25, 2024)
Introduction
In my previous Vertex AI’s Grounding with Google Search: how to use it and why post, I explained why you need grounding with large language models (LLMs) and how Vertex AI’s grounding with Google Search can help to ground LLMs with public up-to-date data.
That’s great but you sometimes need to ground LLMs with your own private data. How can you do that? There are many ways but Vertex AI Search is the easiest way and that’s what I want to talk about today with a simple use case.
It’s no secret that LLMs sometimes lie and they do so in a very confident kind of way. This might be OK for some applications but it can be a real problem if your application requires high levels of accuracy.
I remember when the first LLMs emerged back in early 2023. I tried some of the early models and it felt like they were hallucinating half of the time. More recently, it started feeling like LLMs are getting better at giving more factual answers. But it’s just a feeling and you can’t base application decisions (or any decision?) on feelings, can you?
It’s been a while since I wrote a non-work related topic. Last time, I wrote
about the unique kindness I experienced in Japan (see The Butterfly effect of
kindness). This time, I
want to write about a dilemma that I’ve been thinking about for a while.
When I reflect on my life so far, whenever I had some progress (learning a new
skill, making new lasting connections, changing to a new job, losing weight), it
was always due to consistency in my life. I was not traveling, I was not
thinking about where to go, what to do, where to eat, how to get from point A to
point B. I was in my familiar environment with a consistent (and maybe boring)
routine where the basics of my life were in place. As a result, I had time, got
bored, and started exploring. This consistency fueled boredom allowed me to
explore an aspect of life that I wasn’t happy about and I put the time and
energy into improving it.
Posted on May 29, 2024
(Last modified on May 30, 2024)
Introduction
Once in a while, you come across a feature that is so easy to use and so useful
that you don’t know how you lived without it before. For me, Vertex AI’s Grounding
with Google
Search
is one of those features.
In this blog
post,
I explain why you need grounding with large language models (LLMs) and how
Vertex AI’s Grounding with Google Search can help with minimal effort on your
part.
Back in February, Google
announced
Gemini 1.5 Pro with its impressive 1 million token context window.
Gemini 1.5 Pro
Larger context size means that Gemini 1.5 Pro can process vast amounts of
information in one go — 1 hour of video, 11 hours of audio, 30,000 lines of code
or over 700,000 words and the good news is that there’s good language support.
In this blog post, I will point out some samples utilizing Gemini 1.5 Pro in
Google Cloud’s Vertex AI in different use cases and languages (Python, Node.js,
Java, C#, Go).
One challenge with any distributed system, including Workflows, is ensuring that requests sent from one service to another are processed exactly once, when needed; for example, when placing a customer order in a shipping queue, withdrawing funds from a bank account, or processing a payment.
In this blog post, we’ll provide an example of a website invoking Workflows, and Workflows in turn invoking a Cloud Function. We’ll show how to make sure both Workflows and the Cloud Function logic only runs once. We’ll also talk about how to invoke Workflows exactly once when using HTTP callbacks, Pub/Sub messages, or Cloud Tasks.