Get 20% Lifetime Off on all plans
Back to Blog

Hybrid Search Explained: Best of Both Worlds

Why combining semantic search with keyword matching gives the most accurate results for AI agents. A deep dive into implementation and optimization.

Chatsy Team
Engineering
November 20, 2024Updated: January 15, 2026
8 min read
Share:
Featured image for article: Hybrid Search Explained: Best of Both Worlds - AI guide by Chatsy Team

If you've built a RAG (Retrieval-Augmented Generation) system, you've faced the fundamental search dilemma: semantic search or keyword search? After extensive testing and production experience at Chatsy, we've learned that the answer is definitively both. Hybrid search combines the strengths of each approach while minimizing their weaknesses.

In this technical deep dive, we'll explore why pure approaches fall short, how hybrid search works, and how to implement it effectively in your own AI applications.

The Problem with Pure Approaches

Before understanding why hybrid search works so well, let's examine why neither pure semantic search nor pure keyword search is sufficient on its own.

Semantic Search: Powerful but Imprecise

Semantic search uses embeddings to find conceptually similar content. When a user types a query, the system converts it to a vector representation, then finds documents with similar vector representations. This approach excels at understanding meaning and intent.

Where semantic search shines:

  • Finding paraphrases: "cancel subscription" matches "terminate membership" because they mean the same thing
  • Understanding intent: "I'm unhappy with the service" correctly surfaces content about complaints and refunds
  • Cross-lingual matching: Queries in one language can match documents in another
  • Handling misspellings: Minor typos often still produce correct embeddings

Where semantic search fails:

  • Exact matches: Searching for "error code E-1234" might return general error handling content rather than documentation for that specific code. The model sees "error" and finds similar concepts, missing the crucial identifier.

  • Proper nouns: "John Smith account issue" might match any content about account issues with people, not specifically John Smith's account.

  • Technical identifiers: "OAuth2", "JWT", "RSA-256" might get conflated with general authentication concepts rather than matching the specific technologies.

  • Rare terms: Highly specialized vocabulary may not have strong embedding representations, leading to poor matches.

These failures aren't bugs—they're inherent to how embedding models work. They're trained to capture semantic similarity, not exact matching.

Keyword Search: Precise but Literal

Keyword search (TF-IDF, BM25) takes the opposite approach. It looks for exact term matches, weighing results by term frequency and document frequency.

Where keyword search excels:

  • Exact term matching: "E-1234" finds exactly that term
  • Rare word importance: Uncommon terms get high weight, which is useful for technical queries
  • Speed and simplicity: No embedding generation required
  • Predictability: Results are deterministic and explainable

Where keyword search fails:

  • Synonyms: "cancel" won't match "terminate" unless both appear in the same document
  • Typos: "refnd" won't match "refund"
  • Linguistic variations: "running" might not match "run" without stemming
  • Context blindness: "apple" matches both fruit and technology company content equally

These limitations make pure keyword search frustrating for users who don't know the exact terminology your documentation uses.

The Hybrid Approach: Getting the Best of Both Worlds

Hybrid search runs both semantic and keyword searches in parallel, then combines the results intelligently. This approach captures the conceptual understanding of semantic search while maintaining the precision of keyword matching.

Here's the flow:

User Query: "How to cancel my Pro subscription?"
                    │
        ┌───────────┴───────────┐
        ▼                       ▼
  Semantic Search          Keyword Search
        │                       │
   Finds content           Finds content
   about canceling         with exact terms
   and subscriptions       "Pro", "cancel"
        │                       │
        └───────────┬───────────┘
                    ▼
            Combine Results
            (Reciprocal Rank Fusion)
                    │
                    ▼
            Re-rank with AI
                    │
                    ▼
            Final Results

The key insight is that documents appearing in both result sets are likely highly relevant. A document that's semantically similar to the query AND contains the exact keywords is almost certainly what the user wants.

Implementation Guide

Let's walk through a complete implementation of hybrid search suitable for production use.

Step 1: Dual Search Execution

Run both searches in parallel for optimal performance:

typescript
async function hybridSearch(query: string, chatbotId: string) { // Generate embedding for semantic search const queryEmbedding = await embed(query); // Execute both searches in parallel const [semanticResults, keywordResults] = await Promise.all([ // Semantic search with pgvector prisma.$queryRaw` SELECT id, content, embedding <=> ${queryEmbedding}::vector AS semantic_distance FROM chunks WHERE chatbot_id = ${chatbotId} ORDER BY semantic_distance LIMIT 20 `, // Keyword search with PostgreSQL full-text search prisma.$queryRaw` SELECT id, content, ts_rank(to_tsvector('english', content), plainto_tsquery('english', ${query})) AS keyword_score FROM chunks WHERE chatbot_id = ${chatbotId} AND to_tsvector('english', content) @@ plainto_tsquery('english', ${query}) ORDER BY keyword_score DESC LIMIT 20 ` ]); return { semanticResults, keywordResults }; }

Note that semantic search returns distance (lower is better) while keyword search returns score (higher is better). We'll normalize these during fusion.

Step 2: Reciprocal Rank Fusion (RRF)

RRF is an elegant algorithm for combining ranked lists from different sources. It doesn't require score normalization because it works purely with rankings:

typescript
function reciprocalRankFusion( resultSets: { id: string; score: number }[][], k: number = 60 ): { id: string; score: number }[] { const scores = new Map<string, number>(); // Process each result set resultSets.forEach(resultSet => { resultSet.forEach((result, rank) => { // RRF formula: 1 / (k + rank + 1) const rrfScore = 1 / (k + rank + 1); scores.set(result.id, (scores.get(result.id) || 0) + rrfScore); }); }); // Convert to sorted array return Array.from(scores.entries()) .map(([id, score]) => ({ id, score })) .sort((a, b) => b.score - a.score); }

The k parameter (typically 60) controls how quickly scores decay with rank. Higher k values give more weight to lower-ranked results.

Step 3: AI Re-ranking (Optional but Powerful)

For the highest accuracy, re-rank the top results with a cross-encoder model or dedicated reranking API:

typescript
async function rerank( query: string, documents: { id: string; content: string }[] ): Promise<{ id: string; score: number }[]> { // Use Cohere's reranking API const response = await cohere.rerank({ model: "rerank-english-v3.0", query, documents: documents.map(d => d.content), topN: 10 }); return response.results.map(r => ({ id: documents[r.index].id, score: r.relevance_score })); }

Re-ranking adds latency (typically 100-200ms) but significantly improves relevance, especially for ambiguous queries.

Complete Pipeline

Putting it all together:

typescript
async function search(query: string, chatbotId: string, topK: number = 5) { // 1. Get results from both search methods const { semanticResults, keywordResults } = await hybridSearch(query, chatbotId); // 2. Normalize and prepare for fusion const normalizedSemantic = semanticResults.map((r, i) => ({ id: r.id, score: 1 - r.semantic_distance // Convert distance to similarity })); const normalizedKeyword = keywordResults.map(r => ({ id: r.id, score: r.keyword_score })); // 3. Apply RRF const fusedResults = reciprocalRankFusion([normalizedSemantic, normalizedKeyword]); // 4. Fetch full documents for top candidates const candidateIds = fusedResults.slice(0, 20).map(r => r.id); const documents = await prisma.chunk.findMany({ where: { id: { in: candidateIds } }, select: { id: true, content: true } }); // 5. Re-rank with AI const rerankedResults = await rerank(query, documents); // 6. Return top K return rerankedResults.slice(0, topK); }

Performance Results

We benchmarked hybrid search against pure approaches on 1,000 real customer queries across diverse domains:

MethodPrecision@5Recall@10MRRLatency (P95)
Semantic Only72%68%0.6585ms
Keyword Only58%71%0.5225ms
Hybrid (RRF)84%82%0.7895ms
Hybrid + Rerank91%89%0.86280ms

Key observations:

  • Hybrid search improved precision by 17% over semantic-only
  • Recall improved by 14% — hybrid finds more relevant documents
  • MRR (Mean Reciprocal Rank) improved by 20% — relevant documents appear higher
  • Latency is acceptable — the parallel execution keeps overhead minimal

When to Use Different Approaches

Not every query benefits equally from each approach. Consider adapting your strategy based on query characteristics:

Query TypeRecommended ApproachWhy
Conceptual questionsSemantic-heavy (70/30)"How does authentication work?" benefits from understanding concepts
Specific terms/codesKeyword-heavy (30/70)"Error JWT-401" needs exact matching
General questionsBalanced hybrid (50/50)"How do I reset password?" needs both
Multi-part queriesHybrid + rerank"Cancel Pro plan and get refund" has multiple intents

You can implement dynamic weighting by analyzing the query before search:

typescript
function getSearchWeights(query: string): { semantic: number; keyword: number } { const hasSpecificTerms = /[A-Z]{2,}|\d{3,}|error\s*code/i.test(query); const isConceptual = /^(what|how|why|explain)/i.test(query); if (hasSpecificTerms) return { semantic: 0.3, keyword: 0.7 }; if (isConceptual) return { semantic: 0.7, keyword: 0.3 }; return { semantic: 0.5, keyword: 0.5 }; }

Practical Optimization Tips

After running hybrid search in production, here are our key learnings:

1. Tune the fusion parameters: The default 50/50 split works well, but test with your actual queries. Some domains benefit from different ratios.

2. Cache embeddings aggressively: Query embedding generation is the slowest part of semantic search. Cache recent queries to avoid redundant computation.

3. Maintain your indexes: Full-text indexes need periodic maintenance. Run REINDEX during low-traffic periods to maintain performance.

4. Monitor result quality: Track which results users actually click or which answers resolve issues. Use this data to tune your approach.

5. Consider query expansion: Before searching, expand queries to cover synonyms and related terms.

Conclusion

Pure semantic search revolutionized information retrieval, but it's not the complete answer. Hybrid search combines the conceptual understanding of embeddings with the precision of keyword matching, delivering significantly better results for real-world queries.

At Chatsy, hybrid search is the default for all AI agents. Your chatbots automatically benefit from both approaches, optimized through thousands of hours of production experience.

For more on building effective RAG systems, check out our technical guides on query expansion and our pgvector migration story.

Try Hybrid Search →

Tags:
#search
#ai
#hybrid-search
#rag

Related Articles

Ready to try Chatsy?

Build your own AI customer support agent in minutes.

Start Free Trial