Retrieval-Augmented Generation (RAG) emerged as a dominant framework to feed LLMs the context beyond the scope of its training data and enable LLMs to respond with more grounded answers with less hallucinations based on that context.
However, designing an effective RAG pipeline can be challenging. You need to answer certain questions such as:
- How should you parse and chunk text documents for embedding? What chunk and overlay size to use?
- What embedding model is best for your use case?
- What retrieval method works most effectively? How many documents should you retrieve by default? Does the retriever actually manage to retrieve the relevant documents?
- Does the generator actually generate content in line with the relevant context? What parameters (e.g. model, prompt template, temperature) work better?
The only way to objectively answer these questions is to measure how well the RAG pipeline works but what exactly do you measure? This is the topic of this blog post.
Read More →

