Hybrid Retrieval and Reranking in RAG

Hybrid Retrieval and Reranking in RAG: A Dual-Stage Approach to Improve Information Recall and Precision

by: Varadrajan Kunsavalikar

Abstract

Retrieval-Augmented Generation (RAG) systems rely on efficient retrieval mechanisms to fetch relevant information for downstream generative tasks. Traditional retrieval methods, such as BM25 (lexical matching) and dense embeddings (semantic retrieval via cosine similarity), have inherent limitations— BM25 struggles with semantic understanding, while dense retrieval often overlooks exact keyword matches. To address these challenges, we propose a hybrid retrieval approach that combines BM25 and cosine similarity-based dense retrieval to maximize recall and precision.

In our approach, we first retrieve candidate chunks separately using BM25 and cosine similarity. We then merge the unique chunks from both methods to create a diverse and enriched candidate pool. To further refine the selection, we employ a cross-encoder reranking model from the Sentence Transformers library, which evaluates query-chunk pairs more contextually, leading to improved relevance scoring. This additional reranking step significantly enhances true positive retrievals while reducing false positives.

Through rigorous experimentation, we have found that our Hybrid Retrieval + Reranking pipeline outperforms individual retrieval methods in precision, recall. Our results indicate that this dual-stage approach is particularly effective in improving retrieval quality in RAG-based applications, making it suitable for enterprise search, document processing, and conversational AI. This paper details the methodology, implementation, and key insights into optimizing retrieval strategies for modern AI workflows.

 

Introduction

Problem Statement & Importance of Retrieval in RAG

Retrieval-Augmented Generation (RAG) is a powerful technique that enhances generative AI models by incorporating external knowledge retrieval. It is widely used in applications such as document search, legal and medical text retrieval, customer support automation, and enterprise AI systems, where retrieving accurate and contextually relevant information is critical for downstream processing.

The success of a RAG system depends not only on the generative model but also on the quality of retrieved information. If irrelevant or incomplete data is retrieved, the generated responses will be unreliable. Thus, retrieval plays a fundamental role in ensuring that generative models have access to the most relevant knowledge before generating responses.

To read more from this paper, please download the full PDF: Hybrid Retrieval and Reranking in RAG: A Dual-Stage Approach to Improve Information Recall and Precision.

Frequently Asked Questions

A hybrid retriever combines traditional sparse retrieval methods like BM25 with modern dense retrieval models to balance precision, recall, and efficiency when fetching relevant documents.

Initial retrieval casts a wide net for potential answers. Reranking refines these results by scoring them more deeply, ensuring the most useful information is surfaced to the RAG model.

It addresses weaknesses of using sparse or dense retrieval alone—such as missing semantically relevant documents or surfacing irrelevant keyword matches—by combining their strengths.