Hybrid Search
Hybrid search is a retrieval method that combines semantic search (vector/embedding-based) with lexical search (keyword/BM25-based) to find relevant information. By merging both approaches, hybrid search achieves higher accuracy than either method alone.
How it works
Semantic search excels at understanding meaning but can miss specific terms, product names, and exact phrases. Keyword search (BM25) excels at matching specific terms but misses paraphrased content. Hybrid search combines both:
1. **Semantic search**: Finds content with similar meaning to the query 2. **BM25 keyword search**: Finds content containing the exact terms 3. **Reciprocal Rank Fusion (RRF)**: Merges and re-ranks both result sets
This means searching for "Chatsy Pro plan pricing" would find documents about "Chatsy Pro subscription cost" (semantic) AND documents containing the exact term "Pro plan" (keyword). Neither search alone would find both.
Why it matters
How Chatsy uses hybrid search
Real-world examples
Key takeaways
Frequently asked questions
Is hybrid search better than vector search alone?
Yes, in most cases. Studies show hybrid search improves recall by 10-30% compared to vector search alone, especially for queries containing specific terms, product names, or technical jargon that semantic search can miss.
Does hybrid search slow down the chatbot?
The additional latency is negligible — typically 10-50 milliseconds. Both searches run in parallel and results are merged. The accuracy improvement far outweighs the minimal latency cost.
When should I use hybrid search instead of vector search alone?
Always, if your platform supports it. Hybrid search is strictly better than vector-only search for customer support because support queries frequently contain specific product names, error codes, and technical terms that keyword search handles better than semantic search.
What is Reciprocal Rank Fusion (RRF)?
RRF is an algorithm that combines ranked result lists from multiple search methods. It scores each result based on its rank position in each list (1/rank), then sums the scores. Results that rank highly in both keyword and semantic search get the highest combined scores, surfacing the most relevant content.