Large Language Models (LLMs) feel like a totally new technology with totally new problems. It’s true to some extend but at the same time, it also has the same old problems that we had to tackle in traditional technology.
For example, how do you figure out which LLM calls are taking too long or failed? At the bare minimum, you need logging but ideally, you use a full observability platform like OpenTelemetry with logging, tracing, metric and more. You need the good old software engineering practices, such as observability, applied to new technologies like LLMs.
In this post, I’ll talk about a subset of observability, tracing, and show you how you can trace your LLM calls in OpenTelemetry complaint way using Langtrace.
Introduction to Langtrace
Langtrace is an open-source observability tool that collects and analyzes traces in order to help you improve your LLM apps. It has an SDK to collect traces from LLM APIs, Vector Databases, and LLM based Frameworks. The traces are open telemetry compatible and can be exported to Langtrace or any other observability stack (Grafana, Datadog, Honeycomb etc). There’s also a web-based Langtrace Dashboard where you can view and analyze your traces.
Let’s take a look how to trace with Langtrace and Gemini on Google AI and Vertex AI. All the code is in main.py.
Setup
First, signup for Langtrace and create a project:
Then, create an API key:
Set it to an environment variable:
export LANGTRACE_API_KEY=your-langtrace-api-key
It’s also a good idea to create a Python virtual environment:
python -m venv .venv
source .venv/bin/activate
Install Langtrace:
pip install langtrace-python-sdk
Langtrace and Gemini on Google AI
Let’s now look at how to trace LLM calls with Langtrace and Gemini running on Google AI.
First, get an API key for Gemini and set it to an environment variable:
export GEMINI_API_KEY=your-gemini-api-key
Install Google AI Python SDK for the Gemini:
pip install google-generativeai
Now, you can initialize Langtrace and Gemini:
import os
from langtrace_python_sdk import langtrace # Must precede any llm module imports
import google.generativeai as genai
langtrace.init(api_key=os.environ["LANGTRACE_API_KEY"])
model = genai.GenerativeModel("gemini-1.5-flash")
genai.configure(api_key=os.environ["GEMINI_API_KEY"])
And generate some content with Gemini on Google AI:
def generate_googleai_1():
response = model.generate_content("What is Generative AI?")
print(response.text)
response = model.generate_content("Why is sky blue?")
print(response.text)
Run it:
python main.py generate_googleai_1
In a few seconds, you’ll see traces for two LLM calls:
You can also see more details within each trace:
Grouping traces
In the previous example, the two traces for two LLM calls were displayed individually. Sometimes, it’s useful to group
similar calls in the same trace. You can do that by with @with_langtrace_root_span
:
@with_langtrace_root_span("generate_googleai")
def generate_googleai_2():
response = model.generate_content("What is Generative AI?")
print(response.text)
response = model.generate_content("Why is sky blue?")
print(response.text)
Run it:
python main.py generate_googleai_2
You’ll now see traces grouped together:
You can also see more details within each trace:
Langtrace and Gemini on Vertex AI
You can do everything I explained with Gemini on Vertex AI as well. Let’s take a quick look how.
Make sure your gcloud
is set up with your Google Cloud project and it is set as an environment variable:
gcloud config set core/project your-google-cloud-project-id
export GOOGLE_CLOUD_PROJECT_ID=your-google-cloud-project-id
Make sure you’re logged in:
gcloud auth application-default login
Install Google Vertex AI Python SDK for the Gemini:
pip install google-cloud-aiplatform
Now, generate some content with Gemini on Vertex AI:
@with_langtrace_root_span("generate_vertexai")
def generate_vertexai():
vertexai.init(project=os.environ["GOOGLE_CLOUD_PROJECT_ID"], location="us-central1")
model = GenerativeModel("gemini-1.5-flash-002")
response = model.generate_content("What is Generative AI?")
print(response.text)
response = model.generate_content("Why is sky blue?")
print(response.text)
Run it:
python main.py generate_vertexai
In a few seconds, you’ll see traces for the two LLM calls:
Nice!
Metrics
Last but not least, after sending traces, if you switch to Metrics tab, you can see some metrics on token counts, costs, and more:
Conclusion
Langrace is quite useful to trace your LLM calls and get some basic metrics. It can also be used to manage prompts and for evaluations that I might talk about in a future blog post.
Here are some links for further reading: