Why We Migrated from Pinecone to pgvector: A 97% Cost Reduction Story
How we achieved massive cost savings while improving performance by moving to PostgreSQL with pgvector extension. A detailed technical guide on vector database migration.

When we started Chatsy, we chose Pinecone for vector storage. It was the obvious choice — purpose-built for vector search, great developer experience, and excellent performance. The managed service meant we could focus on building our product rather than managing infrastructure.
But as we scaled, our Pinecone bill grew from $100/month to over $3,000/month. With hundreds of chatbots and millions of document chunks, the cost trajectory was unsustainable. We knew there had to be a better way.
This is the story of how we migrated to pgvector, reduced our costs by 97%, and actually improved performance in the process.
The Case for Migration
Before diving into the technical details, let's establish why we decided to migrate in the first place. This wasn't a decision we took lightly — Pinecone was working well technically. The issues were primarily economic and architectural.
The Cost Problem
Our Pinecone usage followed a predictable pattern: every new customer meant more vectors, and more vectors meant higher bills. The pricing model, based on vector count and query volume, created a linear relationship between growth and cost.
| Month | Vectors | Monthly Cost |
|---|---|---|
| Jan | 500K | $100 |
| Mar | 1.2M | $450 |
| Jun | 3.5M | $1,200 |
| Sep | 8M | $3,000 |
At this trajectory, we were looking at $10,000+ monthly bills within a year. For a startup, that's a significant burn rate for a single infrastructure component.
The Architecture Problem
Beyond cost, we faced an architectural challenge: data synchronization. Our document metadata lived in PostgreSQL, but the embeddings lived in Pinecone. Every query required:
- Query Pinecone for similar vectors
- Get document IDs from results
- Fetch full documents from PostgreSQL
- Merge and return results
This dual-database pattern created synchronization challenges, increased latency, and complicated our codebase. When documents were deleted, we had to remember to delete from both systems. When reindexing, we had to coordinate across services.
Evaluating Alternatives
We evaluated four main alternatives to Pinecone:
Weaviate: Excellent vector database with built-in vectorization. However, running it ourselves added operational complexity, and the managed offering had similar cost concerns.
Milvus: Powerful and scalable, but designed for much larger deployments than we needed. The operational overhead didn't match our team size.
Qdrant: Modern and performant, but relatively new. We were concerned about long-term stability and ecosystem maturity.
pgvector: PostgreSQL extension for vector similarity search. Adds vector capabilities to our existing database.
We chose pgvector for several compelling reasons.
Why pgvector Won
1. Unified Data Layer
The most significant advantage was eliminating the dual-database problem. With pgvector, our vectors live alongside our relational data in the same database, same transactions, same backups.
sqlSELECT content, embedding <=> query_embedding AS distance FROM documents WHERE chatbot_id = $1 AND organization_id = $2 AND deleted_at IS NULL ORDER BY distance LIMIT 10;
This single query combines vector similarity search with relational filtering — something that previously required multiple round trips. The simplification was dramatic.
2. Mature Ecosystem
PostgreSQL has 30+ years of battle-tested reliability. By building on PostgreSQL, we inherited:
- ACID transactions: Our vector operations are fully transactional
- Point-in-time recovery: We can restore vectors to any moment in time
- Connection pooling: PgBouncer works identically for vector queries
- Replication: Our vectors are replicated with the rest of our data
- Monitoring: Existing tools (pg_stat, EXPLAIN ANALYZE) work perfectly
- Team expertise: Every developer knows PostgreSQL
We didn't have to learn a new database, configure new monitoring, or maintain separate backup procedures.
3. Dramatic Cost Efficiency
Our new infrastructure costs $90/month on a managed PostgreSQL instance (we use Supabase). That's a 97% reduction from our Pinecone bill.
The math is straightforward: managed PostgreSQL instances are commodity infrastructure with mature, competitive pricing. Vector databases are specialized tools with specialized pricing.
The Migration Process
Migrating production data between databases requires careful planning. We broke the migration into four phases over three weeks.
Phase 1: Schema Design (Week 1)
We added a vector column to our existing documents table:
sql-- Add the vector column ALTER TABLE documents ADD COLUMN embedding vector(1536); -- Create an index for similarity search CREATE INDEX documents_embedding_idx ON documents USING ivfflat (embedding vector_cosine_ops) WITH (lists = 100);
We chose ivfflat indexing over hnsw because:
- Lower memory overhead for our scale
- Simpler to tune and maintain
- Excellent performance up to ~10M vectors
The lists = 100 parameter determines the number of clusters. We used the rule of thumb of sqrt(num_vectors) and planned to adjust as we grew.
Phase 2: Backfill (Week 1-2)
We wrote a migration script to copy all existing vectors from Pinecone to PostgreSQL:
typescriptasync function backfillVectors() { const batchSize = 1000; let cursor = null; do { // Fetch a batch of vectors from Pinecone const response = await pinecone.list({ limit: batchSize, cursor }); const ids = response.vectors.map(v => v.id); const vectors = await pinecone.fetch({ ids }); // Batch insert into PostgreSQL const updates = Object.entries(vectors).map(([id, vector]) => ({ where: { id }, data: { embedding: vector.values } })); await prisma.$transaction( updates.map(u => prisma.document.update(u)) ); cursor = response.nextCursor; console.log(`Migrated ${ids.length} vectors`); } while (cursor); }
The backfill ran over a weekend, processing approximately 500,000 vectors per hour. We monitored for errors and reran failed batches.
Phase 3: Dual-Write (Week 2)
During the transition period, we wrote to both databases simultaneously. This ensured pgvector stayed in sync while we validated results:
typescriptasync function indexDocument(doc: Document, embedding: number[]) { // Write to both systems in parallel await Promise.all([ pinecone.upsert([{ id: doc.id, values: embedding, metadata: { chatbotId: doc.chatbotId } }]), prisma.document.update({ where: { id: doc.id }, data: { embedding } }) ]); }
We also built a validation system that ran queries against both databases and compared results. This caught several edge cases before they became production issues.
Phase 4: Cutover (Week 3)
After validating that pgvector results matched Pinecone results at 99.9% accuracy, we switched reads to pgvector:
typescript// Before: Pinecone query const results = await pinecone.query({ vector: queryEmbedding, topK: 10, filter: { chatbotId } }); // After: pgvector query const results = await prisma.$queryRaw<DocumentResult[]>` SELECT id, content, metadata, embedding <=> ${queryEmbedding}::vector AS distance FROM documents WHERE chatbot_id = ${chatbotId} AND embedding IS NOT NULL ORDER BY distance LIMIT 10 `;
We kept Pinecone running in read-only mode for two weeks as a fallback, then decommissioned it entirely.
Performance Results
The performance improvements surprised us. We expected pgvector to be slightly slower — a reasonable trade-off for the cost savings. Instead:
| Metric | Pinecone | pgvector | Change |
|---|---|---|---|
| P50 Latency | 45ms | 38ms | -16% |
| P95 Latency | 120ms | 85ms | -29% |
| P99 Latency | 250ms | 150ms | -40% |
| Monthly Cost | $3,000 | $90 | -97% |
Yes, pgvector is actually faster for our use case. The primary reason: co-location of vector and metadata eliminates network round trips. When both data types live in the same database, there's no network latency between the similarity search and the metadata fetch.
Lessons Learned
After running pgvector in production for six months, here are our key takeaways:
Start with IVFFlat, Consider HNSW Later
IVFFlat indexing is simpler, uses less memory, and works excellently up to ~10M vectors. HNSW provides better accuracy at large scale but requires significantly more memory. Don't prematurely optimize.
Tune Your Lists Parameter
The lists parameter in IVFFlat determines the number of clusters. Too few lists means slow queries (scanning too many vectors). Too many lists means slow inserts (updating too many clusters). We use sqrt(num_vectors) as a starting point and adjust based on monitoring.
Use Partial Indexes for Multi-Tenancy
If you filter by tenant (which we always do), create partial indexes:
sqlCREATE INDEX documents_embedding_org1_idx ON documents USING ivfflat (embedding vector_cosine_ops) WHERE organization_id = 'org_123';
This dramatically improves query performance for large tenants.
Monitor Vacuum Aggressively
Vector columns are large (1536 dimensions × 4 bytes = 6KB per row). Dead tuples accumulate faster than with typical data. We run aggressive vacuuming to prevent bloat.
Plan for Index Rebuilds
IVFFlat indexes need rebuilding when data distribution changes significantly. We rebuild indexes quarterly, scheduled during low-traffic periods.
Conclusion
The migration took 3 weeks of engineering time and saved us $35,000/year. More importantly, it simplified our architecture, reduced operational complexity, and actually improved performance.
Not every workload is suitable for pgvector. If you need billion-scale vectors with sub-10ms latency, dedicated vector databases still make sense. But for most applications — especially those already using PostgreSQL — pgvector is the pragmatic choice.
For more on how we build Chatsy's infrastructure, check out our posts on hybrid search and building for scale.
Related Articles
Building for Scale: How We Handle Millions of Documents
The architecture and optimizations that allow our platform to scale seamlessly from startup to enterprise.
RAG vs Fine-Tuning: Which is Right for Your AI Chatbot?
Should you use Retrieval-Augmented Generation or fine-tune a model for your chatbot? We break down the pros, cons, and best use cases for each approach.
The Complete Guide to Building AI Chatbots in 2026
Everything you need to know about building, training, and deploying AI chatbots for customer support. From choosing the right AI model to measuring success.