DeepEval adds native support for Gemini as an LLM Judge

In my previous post on DeepEval and Vertex AI, I introduced DeepEval, an open-source evaluation framework for LLMs. I also demonstrated how to use Gemini (on Vertex AI) as an LLM Judge in DeepEval, replacing the default OpenAI judge to evaluate outputs from other LLMs. At that time, the Gemini integration with DeepEval wasn’t ideal and I had to implement my own integration.

Thanks to the excellent work by Roy Arsan in PR #1493, DeepEval now includes native Gemini integration. Since it’s built on the new unified Google GenAI SDK, DeepEval supports Gemini models running both on Vertex AI and Google AI. Nice!

There are two ways to use Gemini as an LLM Judge in DeepEval: via the command line or directly in your code. Let’s explore both.

Gemini on Vertex AI

The easiest way to use Gemini on Vertex AI with DeepEval is through the command line using the deepeval tool.

For example, to use gemini-1.5-pro in your Google Cloud project genai-atamel as the evaluation model:

deepeval set-gemini --model-name="gemini-1.5-pro" \
    --project-id="genai-atamel" \
    --location="us-central1"

Once set, when you create a metric for evaluation, DeepEval will use this Gemini model by default instead of OpenAI:

metric = AnswerRelevancyMetric(threshold=0.8)

Alternatively, you can set the evaluation model directly in your code.

First, create a GeminiModel instance with your Vertex AI parameters:

from deepeval.models import GeminiModel

EVAL_MODEL = "gemini-1.5-pro"
PROJECT_ID = "genai-atamel"
LOCATION = "us-central1"

eval_model = GeminiModel(
    model_name=EVAL_MODEL,
    project=PROJECT_ID,
    location=LOCATION
)

Then, pass the model to your metric:

metric = AnswerRelevancyMetric(
    model=eval_model,
    threshold=0.8
)

You can find a full example in test_answer_relevancy.py in my repository.

Gemini on Google AI

The new integration also supports Gemini via Google AI. The main difference is that you need a Google AI API key instead of a project ID and location.

You can configure it from the command line like this:

deepeval set-gemini --model-name="gemini-1.5-pro" \
    --google-api-key="your-google-ai-api-key"

Or directly in code:

from deepeval.models import GeminiModel

EVAL_MODEL = "gemini-1.5-pro"
GOOGLEAI_API_KEY = "your-google-ai-api-key"

eval_model = GeminiModel(
    model_name=EVAL_MODEL,
    api_key=GOOGLEAI_API_KEY
)

metric = AnswerRelevancyMetric(
    model=eval_model,
    threshold=0.8
)

I’m glad to see native Gemini support in DeepEval now. For more details, check out DeepEval’s Gemini documentation. For examples on other metrics such as summarization, hallucination detection, or the RAG triad, check out my DeepEval tutorial on GitHub, which has been updated with the new Gemini integration.

Happy evaluations with DeepEval and Gemini as LLM Judge!

Gemini on Vertex AI

Gemini on Google AI

See also