Improve the RAG pipeline with RAG triad metrics
In my previous RAG Evaluation - A Step-by-Step Guide with DeepEval post, I showed how to evaluate a RAG pipeline with the RAG triad metrics using DeepEval and Vertex AI. As a recap, these were the results:
Answer relevancy and faithfulness metrics had perfect 1.0 scores whereas contextual relevancy was low at 0.29 because we retrieved a lot of irrelevant context:
The score is 0.29 because while the context mentions relevant information such as "The Cymbal Starlight 2024 has a cargo capacity of 13.
Read More →