DeepEval and Vertex AI
Introduction When you’re working with Large Language Models (LLMs), it’s crucial to have an evaluation framework in place. Only by constantly evaluating and testing your LLM outputs, you can tell if the changes you’re making to prompts or the output you’re getting back from the LLM are actually good.
In this blog post, we’ll look into one of those evaluation frameworks called DeepEval, an open-source evaluation framework for LLMs. It allows to “unit test” LLM outputs in a similar way to Pytest.
Read More →