When you ask an AI chatbot "How do I cancel?", how does it know you mean your subscription and not a meeting? The answer is vector search—a technology that understands meaning, not just keywords.

The Problem with Keyword Search

Traditional search matches words. Ask "How do I cancel?" and it looks for documents containing "cancel."

But what if your help docs say "terminate subscription" or "end your plan"? Keyword search misses these completely.

You Ask	Doc Contains	Keyword Match?
"cancel"	"cancel"	✅ Yes
"cancel"	"terminate"	❌ No
"cancel"	"end subscription"	❌ No
"cancel my plan"	"how to cancel"	✅ Yes
"stop my subscription"	"cancel plan"	❌ No

This is why old chatbots felt so frustrating—slight wording differences broke everything.

Enter Vector Search

Vector search converts text into embeddings—numerical representations that capture meaning.

How Embeddings Work

Text → AI Model → Vector (list of numbers)

"How do I cancel?" → [0.23, -0.45, 0.67, 0.12, ...]
"Terminate my subscription" → [0.21, -0.43, 0.69, 0.14, ...]
"What's the weather?" → [-0.56, 0.34, -0.12, 0.78, ...]

Notice how similar meanings have similar numbers? That's the magic.

Similarity Search

To find relevant content:

Convert the question to a vector
Compare it to all document vectors
Return documents with the closest vectors

Question Vector: [0.23, -0.45, 0.67, ...]
                    ↓
Compare to all document vectors
                    ↓
Return: "How to cancel your subscription" (similarity: 0.94)
        "Ending your plan early" (similarity: 0.89)
        "Refund policy" (similarity: 0.72)

Why It's Better

Understands Synonyms

"cancel," "terminate," "end," and "stop" cluster together in vector space.

Handles Paraphrasing

"I want my money back" finds "refund policy" even without shared words.

Language-Agnostic

Modern embeddings work across languages—a Spanish question can match English docs.

Typo-Tolerant

"How do I cancle my subcription" still works because the meaning is captured.

The Technical Details

Embedding Models

Popular models for text embeddings:

Model	Dimensions	Best For
OpenAI text-embedding-3-large	3072	General purpose
Cohere embed-v3	1024	Multilingual
Voyage-3	1024	Long documents
BGE-large	1024	Open source option

Higher dimensions = more nuance, but slower search.

Vector Databases

Storing and searching vectors requires specialized databases:

Pinecone: Managed, easy to use
Weaviate: Open source, feature-rich
pgvector: PostgreSQL extension
Qdrant: High performance, open source

At Chatsy, we use pgvector for reliability and cost-effectiveness.

Chunking Strategy

Before embedding, documents are split into chunks:

Full Document (5000 words)
    ↓
Chunk 1 (500 words): "Subscription Management..."
Chunk 2 (500 words): "Cancellation Policy..."
Chunk 3 (500 words): "Refund Process..."

Why chunk?

LLMs have context limits
Smaller chunks = more precise matches
Better relevance scoring

Optimal chunk size: 500-1000 tokens with 100-200 token overlap

Limitations of Pure Vector Search

Vector search isn't perfect:

1. Exact Match Failures

"Error code E-1234" might not match "E-1234 error" well because embeddings focus on semantic meaning, not exact strings.

2. Rare Terms

Uncommon product names or technical terms may not embed well.

3. Negation Confusion

"I don't want to cancel" and "I want to cancel" have similar embeddings despite opposite meanings.

The Solution: Hybrid Search

Combine vector and keyword search:

User Question
    ↓
┌─────────────────┬─────────────────┐
│ Vector Search   │ Keyword Search  │
│ (meaning)       │ (exact terms)   │
└─────────────────┴─────────────────┘
    ↓                   ↓
    └─────────┬─────────┘
              ↓
    Combine & Rerank
              ↓
    Final Results

This gets the best of both worlds—semantic understanding AND exact matching.

How Chatsy Uses Vector Search

Our retrieval pipeline:

Query Expansion: Generate synonyms and related queries
Hybrid Search: Vector + keyword across all queries
Reciprocal Rank Fusion: Combine results intelligently
Reranking: Use a cross-encoder for final relevance scoring
Context Assembly: Select best chunks for the LLM

This multi-stage approach delivers 94%+ relevant answer rates.

Implementing Vector Search

Basic Implementation (Python)

python
from openai import OpenAI
import numpy as np

client = OpenAI()

def embed(text):
    response = client.embeddings.create(
        input=text,
        model="text-embedding-3-small"
    )
    return response.data[0].embedding

def cosine_similarity(a, b):
    return np.dot(a, b) / (np.linalg.norm(a) * np.linalg.norm(b))

# Embed your documents
docs = ["How to cancel subscription", "Refund policy", "Pricing plans"]
doc_embeddings = [embed(doc) for doc in docs]

# Search
query = "I want my money back"
query_embedding = embed(query)

# Find most similar
similarities = [cosine_similarity(query_embedding, doc_emb) for doc_emb in doc_embeddings]
best_match = docs[np.argmax(similarities)]
print(best_match)  # "Refund policy"

Production Considerations

For real applications:

Use a vector database (not in-memory arrays)
Implement caching for common queries
Add hybrid search for exact matches
Use async operations for speed
Monitor embedding costs

Key Takeaways

Vector search understands meaning, not just words
Embeddings are numerical representations of text
Similar meanings = similar vectors
Hybrid search combines vector and keyword approaches
Chunking matters for accuracy

Try It Today

Chatsy handles all this complexity for you. Upload your docs, and we automatically:

Chunk content optimally
Generate embeddings
Enable hybrid search
Apply query expansion
Rerank results

Experience Smart Search →

Want more? Read about hybrid search and query expansion.

Vector Search Explained: How AI Chatbots Find Answers