Why We Migrated from Pinecone to pgvector: A 97% Cost Reduction Story
How we achieved massive cost savings while improving performance by moving to PostgreSQL with pgvector extension.

When we started Chatsy, we chose Pinecone for vector storage. It was the obvious choice — purpose-built for vector search, great developer experience, and excellent performance.
But as we scaled, our Pinecone bill grew from $100/month to over $3,000/month. We knew there had to be a better way.
The Decision to Migrate
Our requirements were clear:
- Performance: Sub-100ms query latency at the 95th percentile
- Scale: Support for 10M+ vectors
- Cost: Significant reduction from $3,000/month
- Reliability: 99.9% uptime SLA
After evaluating options (Weaviate, Milvus, Qdrant, pgvector), we chose pgvector — the PostgreSQL extension for vector similarity search.
Why pgvector?
1. Unified Data Layer
With pgvector, our vectors live alongside our relational data. No more syncing between databases. One source of truth.
sqlSELECT content, embedding <=> query_embedding AS distance FROM documents WHERE chatbot_id = $1 ORDER BY distance LIMIT 10;
2. Mature Ecosystem
PostgreSQL has 30+ years of battle-tested reliability. We get:
- ACID transactions
- Point-in-time recovery
- Connection pooling (PgBouncer)
- Mature monitoring tools
3. Cost Efficiency
Our new infrastructure costs $90/month on a managed PostgreSQL instance. That's a 97% reduction.
The Migration Process
Phase 1: Schema Design
We added a vector column to our existing documents table:
sqlALTER TABLE documents ADD COLUMN embedding vector(1536); CREATE INDEX ON documents USING ivfflat (embedding vector_cosine_ops) WITH (lists = 100);
Phase 2: Backfill
We wrote a migration script to copy vectors from Pinecone:
typescriptasync function backfillVectors() { const pineconeVectors = await pinecone.fetch({ ids: documentIds }); for (const [id, vector] of Object.entries(pineconeVectors)) { await prisma.document.update({ where: { id }, data: { embedding: vector.values } }); } }
Phase 3: Dual-Write
During the transition, we wrote to both databases:
typescriptasync function indexDocument(doc: Document, embedding: number[]) { // Write to both await Promise.all([ pinecone.upsert([{ id: doc.id, values: embedding }]), prisma.document.update({ where: { id: doc.id }, data: { embedding } }) ]); }
Phase 4: Cutover
After validating pgvector results matched Pinecone, we switched reads:
typescript// Before const results = await pinecone.query({ vector, topK: 10 }); // After const results = await prisma.$queryRaw` SELECT id, content, embedding <=> ${vector}::vector AS distance FROM documents WHERE chatbot_id = ${chatbotId} ORDER BY distance LIMIT 10 `;
Performance Results
| Metric | Pinecone | pgvector | Change |
|---|---|---|---|
| P50 Latency | 45ms | 38ms | -16% |
| P95 Latency | 120ms | 85ms | -29% |
| P99 Latency | 250ms | 150ms | -40% |
| Monthly Cost | $3,000 | $90 | -97% |
Yes, pgvector is actually faster for our use case. The co-location of vector and metadata eliminates network round trips.
Lessons Learned
-
Start with IVFFlat, consider HNSW: IVFFlat is simpler and works great up to ~1M vectors. HNSW is better for larger scales but uses more memory.
-
Tune your lists parameter: Too few lists = slow queries. Too many = slow inserts. We settled on
sqrt(num_vectors). -
Use partial indexes: If you filter by tenant, create partial indexes per tenant for dramatic speedups.
-
Monitor vacuum: Vector columns are large. Aggressive vacuuming prevents bloat.
Conclusion
The migration took 3 weeks of engineering time and saved us $35,000/year. More importantly, it simplified our architecture and improved performance.
Not every workload is suitable for pgvector — if you need billion-scale vectors with sub-10ms latency, dedicated vector databases still make sense. But for most applications, pgvector is the pragmatic choice.
Related Articles
Building for Scale: How We Handle Millions of Documents
The architecture and optimizations that allow our platform to scale seamlessly from startup to enterprise.
RAG vs Fine-Tuning: Which is Right for Your AI Chatbot?
Should you use Retrieval-Augmented Generation or fine-tune a model for your chatbot? We break down the pros, cons, and best use cases for each approach.
The Complete Guide to Building AI Chatbots in 2026
Everything you need to know about building, training, and deploying AI chatbots for customer support. From choosing the right AI model to measuring success.
Ready to try Chatsy?
Build your own AI customer support agent in minutes.
Start Free Trial