Loading…
Loading…
Written by Max Zeshut
Founder at Agentmelt
A second-stage retrieval step that takes the top N candidates from a vector or keyword search (typically N=20–100) and reorders them with a more accurate but more expensive model—usually a cross-encoder that scores each candidate against the query jointly. Reranking is the highest-leverage retrieval quality fix for RAG-based agents: a typical setup with rerank moves precision-at-5 from 60% to 85%+ at a few cents per query.
A support agent's vector search returns 50 candidate KB articles per query. Before reranking, the right article was in the top 5 only 62% of the time. Adding a Cohere or Voyage reranker that scores all 50 against the customer question and keeps the top 5 raises top-5 recall to 89%—the agent now answers correctly far more often, and total LLM token cost actually drops because fewer wrong-path retries happen downstream.