Get 20% Lifetime Off on all plans
Back to Blog

Why We Migrated from Pinecone to pgvector: A 97% Cost Reduction Story

How we achieved massive cost savings while improving performance by moving to PostgreSQL with pgvector extension. A detailed technical guide on vector database migration.

Asad Ali
Founder & CEO
December 10, 2024
8 min read
Share:
Featured image for article: Why We Migrated from Pinecone to pgvector: A 97% Cost Reduction Story - Engineering guide by Asad Ali

When we started Chatsy, we chose Pinecone for vector storage. It was the obvious choice — purpose-built for vector search, great developer experience, and excellent performance. The managed service meant we could focus on building our product rather than managing infrastructure.

But as we scaled, our Pinecone bill grew from $100/month to over $3,000/month. With hundreds of chatbots and millions of document chunks, the cost trajectory was unsustainable. We knew there had to be a better way.

This is the story of how we migrated to pgvector, reduced our costs by 97%, and actually improved performance in the process.

The Case for Migration

Before diving into the technical details, let's establish why we decided to migrate in the first place. This wasn't a decision we took lightly — Pinecone was working well technically. The issues were primarily economic and architectural.

The Cost Problem

Our Pinecone usage followed a predictable pattern: every new customer meant more vectors, and more vectors meant higher bills. The pricing model, based on vector count and query volume, created a linear relationship between growth and cost.

MonthVectorsMonthly Cost
Jan500K$100
Mar1.2M$450
Jun3.5M$1,200
Sep8M$3,000

At this trajectory, we were looking at $10,000+ monthly bills within a year. For a startup, that's a significant burn rate for a single infrastructure component.

The Architecture Problem

Beyond cost, we faced an architectural challenge: data synchronization. Our document metadata lived in PostgreSQL, but the embeddings lived in Pinecone. Every query required:

  1. Query Pinecone for similar vectors
  2. Get document IDs from results
  3. Fetch full documents from PostgreSQL
  4. Merge and return results

This dual-database pattern created synchronization challenges, increased latency, and complicated our codebase. When documents were deleted, we had to remember to delete from both systems. When reindexing, we had to coordinate across services.

Evaluating Alternatives

We evaluated four main alternatives to Pinecone:

Weaviate: Excellent vector database with built-in vectorization. However, running it ourselves added operational complexity, and the managed offering had similar cost concerns.

Milvus: Powerful and scalable, but designed for much larger deployments than we needed. The operational overhead didn't match our team size.

Qdrant: Modern and performant, but relatively new. We were concerned about long-term stability and ecosystem maturity.

pgvector: PostgreSQL extension for vector similarity search. Adds vector capabilities to our existing database.

We chose pgvector for several compelling reasons.

Why pgvector Won

1. Unified Data Layer

The most significant advantage was eliminating the dual-database problem. With pgvector, our vectors live alongside our relational data in the same database, same transactions, same backups.

sql
SELECT content, embedding <=> query_embedding AS distance FROM documents WHERE chatbot_id = $1 AND organization_id = $2 AND deleted_at IS NULL ORDER BY distance LIMIT 10;

This single query combines vector similarity search with relational filtering — something that previously required multiple round trips. The simplification was dramatic.

2. Mature Ecosystem

PostgreSQL has 30+ years of battle-tested reliability. By building on PostgreSQL, we inherited:

  • ACID transactions: Our vector operations are fully transactional
  • Point-in-time recovery: We can restore vectors to any moment in time
  • Connection pooling: PgBouncer works identically for vector queries
  • Replication: Our vectors are replicated with the rest of our data
  • Monitoring: Existing tools (pg_stat, EXPLAIN ANALYZE) work perfectly
  • Team expertise: Every developer knows PostgreSQL

We didn't have to learn a new database, configure new monitoring, or maintain separate backup procedures.

3. Dramatic Cost Efficiency

Our new infrastructure costs $90/month on a managed PostgreSQL instance (we use Supabase). That's a 97% reduction from our Pinecone bill.

The math is straightforward: managed PostgreSQL instances are commodity infrastructure with mature, competitive pricing. Vector databases are specialized tools with specialized pricing.

The Migration Process

Migrating production data between databases requires careful planning. We broke the migration into four phases over three weeks.

Phase 1: Schema Design (Week 1)

We added a vector column to our existing documents table:

sql
-- Add the vector column ALTER TABLE documents ADD COLUMN embedding vector(1536); -- Create an index for similarity search CREATE INDEX documents_embedding_idx ON documents USING ivfflat (embedding vector_cosine_ops) WITH (lists = 100);

We chose ivfflat indexing over hnsw because:

  • Lower memory overhead for our scale
  • Simpler to tune and maintain
  • Excellent performance up to ~10M vectors

The lists = 100 parameter determines the number of clusters. We used the rule of thumb of sqrt(num_vectors) and planned to adjust as we grew.

Phase 2: Backfill (Week 1-2)

We wrote a migration script to copy all existing vectors from Pinecone to PostgreSQL:

typescript
async function backfillVectors() { const batchSize = 1000; let cursor = null; do { // Fetch a batch of vectors from Pinecone const response = await pinecone.list({ limit: batchSize, cursor }); const ids = response.vectors.map(v => v.id); const vectors = await pinecone.fetch({ ids }); // Batch insert into PostgreSQL const updates = Object.entries(vectors).map(([id, vector]) => ({ where: { id }, data: { embedding: vector.values } })); await prisma.$transaction( updates.map(u => prisma.document.update(u)) ); cursor = response.nextCursor; console.log(`Migrated ${ids.length} vectors`); } while (cursor); }

The backfill ran over a weekend, processing approximately 500,000 vectors per hour. We monitored for errors and reran failed batches.

Phase 3: Dual-Write (Week 2)

During the transition period, we wrote to both databases simultaneously. This ensured pgvector stayed in sync while we validated results:

typescript
async function indexDocument(doc: Document, embedding: number[]) { // Write to both systems in parallel await Promise.all([ pinecone.upsert([{ id: doc.id, values: embedding, metadata: { chatbotId: doc.chatbotId } }]), prisma.document.update({ where: { id: doc.id }, data: { embedding } }) ]); }

We also built a validation system that ran queries against both databases and compared results. This caught several edge cases before they became production issues.

Phase 4: Cutover (Week 3)

After validating that pgvector results matched Pinecone results at 99.9% accuracy, we switched reads to pgvector:

typescript
// Before: Pinecone query const results = await pinecone.query({ vector: queryEmbedding, topK: 10, filter: { chatbotId } }); // After: pgvector query const results = await prisma.$queryRaw<DocumentResult[]>` SELECT id, content, metadata, embedding <=> ${queryEmbedding}::vector AS distance FROM documents WHERE chatbot_id = ${chatbotId} AND embedding IS NOT NULL ORDER BY distance LIMIT 10 `;

We kept Pinecone running in read-only mode for two weeks as a fallback, then decommissioned it entirely.

Performance Results

The performance improvements surprised us. We expected pgvector to be slightly slower — a reasonable trade-off for the cost savings. Instead:

MetricPineconepgvectorChange
P50 Latency45ms38ms-16%
P95 Latency120ms85ms-29%
P99 Latency250ms150ms-40%
Monthly Cost$3,000$90-97%

Yes, pgvector is actually faster for our use case. The primary reason: co-location of vector and metadata eliminates network round trips. When both data types live in the same database, there's no network latency between the similarity search and the metadata fetch.

Lessons Learned

After running pgvector in production for six months, here are our key takeaways:

Start with IVFFlat, Consider HNSW Later

IVFFlat indexing is simpler, uses less memory, and works excellently up to ~10M vectors. HNSW provides better accuracy at large scale but requires significantly more memory. Don't prematurely optimize.

Tune Your Lists Parameter

The lists parameter in IVFFlat determines the number of clusters. Too few lists means slow queries (scanning too many vectors). Too many lists means slow inserts (updating too many clusters). We use sqrt(num_vectors) as a starting point and adjust based on monitoring.

Use Partial Indexes for Multi-Tenancy

If you filter by tenant (which we always do), create partial indexes:

sql
CREATE INDEX documents_embedding_org1_idx ON documents USING ivfflat (embedding vector_cosine_ops) WHERE organization_id = 'org_123';

This dramatically improves query performance for large tenants.

Monitor Vacuum Aggressively

Vector columns are large (1536 dimensions × 4 bytes = 6KB per row). Dead tuples accumulate faster than with typical data. We run aggressive vacuuming to prevent bloat.

Plan for Index Rebuilds

IVFFlat indexes need rebuilding when data distribution changes significantly. We rebuild indexes quarterly, scheduled during low-traffic periods.

Conclusion

The migration took 3 weeks of engineering time and saved us $35,000/year. More importantly, it simplified our architecture, reduced operational complexity, and actually improved performance.

Not every workload is suitable for pgvector. If you need billion-scale vectors with sub-10ms latency, dedicated vector databases still make sense. But for most applications — especially those already using PostgreSQL — pgvector is the pragmatic choice.

For more on how we build Chatsy's infrastructure, check out our posts on hybrid search and building for scale.

See Our Tech Stack →

Tags:
#infrastructure
#postgresql
#pgvector
#cost-optimization

Related Articles

Ready to try Chatsy?

Build your own AI customer support agent in minutes.

Start Free Trial