Chatsy

Pinecone to pgvector: A 97% Cost Reduction

How we cut vector database costs 97% by migrating from Pinecone to pgvector. A detailed technical guide on the migration.

Chatsy Team
Engineering
December 10, 2024Updated: January 15, 2026
10 min read
Share:
Featured image for article: Pinecone to pgvector: A 97% Cost Reduction - Engineering guide by Chatsy Team

When we started Chatsy, we chose Pinecone for vector storage. It was the obvious choice — purpose-built for vector search, great developer experience, and excellent performance. The managed service meant we could focus on building our product rather than managing infrastructure.

TL;DR:

  • Migrating from Pinecone to pgvector cut our vector database costs by 97% — from $3,000/month to $90/month.
  • Query latency actually improved (P95 dropped from 120ms to 85ms) because co-locating vectors with metadata eliminates network round trips.
  • The migration took 3 weeks across four phases: schema design, backfill, dual-write validation, and cutover.
  • For most applications already on PostgreSQL, pgvector is the pragmatic choice — you get ACID transactions, unified backups, and familiar tooling with no extra infrastructure.

But as we scaled, our Pinecone bill grew from $100/month to over $3,000/month. With hundreds of chatbots and millions of document chunks, the cost trajectory was unsustainable. We knew there had to be a better way.

This is the story of how we migrated to pgvector, reduced our costs by 97%, and actually improved performance in the process.

The Case for Migration

Before diving into the technical details, let's establish why we decided to migrate in the first place. This wasn't a decision we took lightly — Pinecone was working well technically. The issues were primarily economic and architectural.

The Cost Problem

Our Pinecone usage followed a predictable pattern: every new customer meant more vectors, and more vectors meant higher bills. The pricing model, based on vector count and query volume, created a linear relationship between growth and cost.

MonthVectorsMonthly Cost
Jan500K$100
Mar1.2M$450
Jun3.5M$1,200
Sep8M$3,000

At this trajectory, we were looking at $10,000+ monthly bills within a year. For a startup, that's a significant burn rate for a single infrastructure component.

The Architecture Problem

Beyond cost, we faced an architectural challenge: data synchronization. Our document metadata lived in PostgreSQL, but the embeddings lived in Pinecone. Every query required:

  1. Query Pinecone for similar vectors
  2. Get document IDs from results
  3. Fetch full documents from PostgreSQL
  4. Merge and return results

This dual-database pattern created synchronization challenges, increased latency, and complicated our codebase. When documents were deleted, we had to remember to delete from both systems. When reindexing, we had to coordinate across services.

Evaluating Alternatives

We evaluated four main alternatives to Pinecone:

Weaviate: Excellent vector database with built-in vectorization. However, running it ourselves added operational complexity, and the managed offering had similar cost concerns.

Milvus: Powerful and scalable, but designed for much larger deployments than we needed. The operational overhead didn't match our team size.

Qdrant: Modern and performant, but relatively new. We were concerned about long-term stability and ecosystem maturity.

pgvector: PostgreSQL extension for vector similarity search. Adds vector capabilities to our existing database.

We chose pgvector for several compelling reasons.

Why pgvector Won

1. Unified Data Layer

The most significant advantage was eliminating the dual-database problem. With pgvector, our vectors live alongside our relational data in the same database, same transactions, same backups.

sql
SELECT content, embedding <=> query_embedding AS distance FROM documents WHERE chatbot_id = $1 AND organization_id = $2 AND deleted_at IS NULL ORDER BY distance LIMIT 10;

This single query combines vector similarity search with relational filtering — something that previously required multiple round trips. The simplification was dramatic.

2. Mature Ecosystem

PostgreSQL has 30+ years of battle-tested reliability. By building on PostgreSQL, we inherited:

  • ACID transactions: Our vector operations are fully transactional
  • Point-in-time recovery: We can restore vectors to any moment in time
  • Connection pooling: PgBouncer works identically for vector queries
  • Replication: Our vectors are replicated with the rest of our data
  • Monitoring: Existing tools (pg_stat, EXPLAIN ANALYZE) work perfectly
  • Team expertise: Every developer knows PostgreSQL

We didn't have to learn a new database, configure new monitoring, or maintain separate backup procedures.

3. Dramatic Cost Efficiency

Our new infrastructure costs $90/month on a managed PostgreSQL instance (we use Supabase). That's a 97% reduction from our Pinecone bill.

The math is straightforward: managed PostgreSQL instances are commodity infrastructure with mature, competitive pricing. Vector databases are specialized tools with specialized pricing.

The Migration Process

Migrating production data between databases requires careful planning. We broke the migration into four phases over three weeks.

Phase 1: Schema Design (Week 1)

We added a vector column to our existing documents table:

sql
-- Add the vector column ALTER TABLE documents ADD COLUMN embedding vector(1536); -- Create an index for similarity search CREATE INDEX documents_embedding_idx ON documents USING ivfflat (embedding vector_cosine_ops) WITH (lists = 100);

We chose ivfflat indexing over hnsw because:

  • Lower memory overhead for our scale
  • Simpler to tune and maintain
  • Excellent performance up to ~10M vectors

The lists = 100 parameter determines the number of clusters. We used the rule of thumb of sqrt(num_vectors) and planned to adjust as we grew.

Phase 2: Backfill (Week 1-2)

We wrote a migration script to copy all existing vectors from Pinecone to PostgreSQL:

typescript
async function backfillVectors() { const batchSize = 1000; let cursor = null; do { // Fetch a batch of vectors from Pinecone const response = await pinecone.list({ limit: batchSize, cursor }); const ids = response.vectors.map(v => v.id); const vectors = await pinecone.fetch({ ids }); // Batch insert into PostgreSQL const updates = Object.entries(vectors).map(([id, vector]) => ({ where: { id }, data: { embedding: vector.values } })); await prisma.$transaction( updates.map(u => prisma.document.update(u)) ); cursor = response.nextCursor; console.log(`Migrated ${ids.length} vectors`); } while (cursor); }

The backfill ran over a weekend, processing approximately 500,000 vectors per hour. We monitored for errors and reran failed batches.

Phase 3: Dual-Write (Week 2)

During the transition period, we wrote to both databases simultaneously. This ensured pgvector stayed in sync while we validated results:

typescript
async function indexDocument(doc: Document, embedding: number[]) { // Write to both systems in parallel await Promise.all([ pinecone.upsert([{ id: doc.id, values: embedding, metadata: { chatbotId: doc.chatbotId } }]), prisma.document.update({ where: { id: doc.id }, data: { embedding } }) ]); }

We also built a validation system that ran queries against both databases and compared results. This caught several edge cases before they became production issues.

Phase 4: Cutover (Week 3)

After validating that pgvector results matched Pinecone results at 99.9% accuracy, we switched reads to pgvector:

typescript
// Before: Pinecone query const results = await pinecone.query({ vector: queryEmbedding, topK: 10, filter: { chatbotId } }); // After: pgvector query const results = await prisma.$queryRaw<DocumentResult[]>` SELECT id, content, metadata, embedding <=> ${queryEmbedding}::vector AS distance FROM documents WHERE chatbot_id = ${chatbotId} AND embedding IS NOT NULL ORDER BY distance LIMIT 10 `;

We kept Pinecone running in read-only mode for two weeks as a fallback, then decommissioned it entirely.

Performance Results

The performance improvements surprised us. We expected pgvector to be slightly slower — a reasonable trade-off for the cost savings. Instead:

MetricPineconepgvectorChange
P50 Latency45ms38ms-16%
P95 Latency120ms85ms-29%
P99 Latency250ms150ms-40%
Monthly Cost$3,000$90-97%

Yes, pgvector is actually faster for our use case. The primary reason: co-location of vector and metadata eliminates network round trips. When both data types live in the same database, there's no network latency between the similarity search and the metadata fetch.

Lessons Learned

After running pgvector in production for six months, here are our key takeaways:

Start with IVFFlat, Consider HNSW Later

As we discuss in building for scale, IVFFlat indexing is simpler, uses less memory, and works excellently up to ~10M vectors. HNSW provides better accuracy at large scale but requires significantly more memory. Don't prematurely optimize.

Tune Your Lists Parameter

The lists parameter in IVFFlat determines the number of clusters. Too few lists means slow queries (scanning too many vectors). Too many lists means slow inserts (updating too many clusters). We use sqrt(num_vectors) as a starting point and adjust based on monitoring.

Use Partial Indexes for Multi-Tenancy

If you filter by tenant (which we always do), create partial indexes:

sql
CREATE INDEX documents_embedding_org1_idx ON documents USING ivfflat (embedding vector_cosine_ops) WHERE organization_id = 'org_123';

This dramatically improves query performance for large tenants.

Monitor Vacuum Aggressively

Vector columns are large (1536 dimensions × 4 bytes = 6KB per row). Dead tuples accumulate faster than with typical data. We run aggressive vacuuming to prevent bloat.

Plan for Index Rebuilds

IVFFlat indexes need rebuilding when data distribution changes significantly. We rebuild indexes quarterly, scheduled during low-traffic periods.

Conclusion

The migration took 3 weeks of engineering time and saved us $35,000/year. More importantly, it simplified our architecture, reduced operational complexity, and actually improved performance.

Not every workload is suitable for pgvector. If you need billion-scale vectors with sub-10ms latency, dedicated vector databases still make sense. But for most applications — especially those already using PostgreSQL — pgvector is the pragmatic choice, particularly when combined with hybrid search for optimal retrieval quality.

For more on how we build Chatsy's infrastructure, check out our posts on hybrid search and building for scale.

See Our Tech Stack →


Want to See pgvector-Powered Search in Action?

Chatsy's AI agents use this exact pgvector architecture to deliver fast, accurate retrieval over your knowledge base — no external vector database required. If you're building support automation on PostgreSQL, you're already halfway there.

Start your free trial →


Frequently Asked Questions

Why migrate from Pinecone to pgvector?

Pinecone costs scale linearly with vector count and query volume — our bill grew from $100 to $3,000/month. pgvector eliminates that by using your existing PostgreSQL instance. You also remove the dual-database problem: vectors and metadata live together, so no sync logic, no extra round trips, and simpler architecture.

Is pgvector production-ready?

Yes. Chatsy migrated to pgvector and has run it in production for six months. PostgreSQL offers ACID transactions, point-in-time recovery, replication, and mature tooling. For most applications under ~10M vectors, pgvector with IVFFlat indexing is stable and performant. Use HNSW for larger scale if needed.

How does pgvector performance compare to Pinecone?

In our migration, pgvector was faster: P95 latency dropped from 120ms to 85ms. Co-locating vectors with metadata eliminates network round trips between the vector DB and PostgreSQL. P50 improved 16% and P99 improved 40%, while monthly cost dropped 97% ($3,000 to $90).

What are the migration steps?

The process has four phases: (1) add a vector column and IVFFlat index to PostgreSQL, (2) backfill vectors from Pinecone in batches, (3) dual-write to both systems and validate results, (4) switch reads to pgvector and decommission Pinecone. Plan for about 3 weeks of engineering time.

How much can I save by migrating to pgvector?

We cut vector database costs by 97% — from $3,000/month to $90/month on a managed PostgreSQL instance. That’s roughly $35,000/year in savings. Managed PostgreSQL is commodity infrastructure with competitive pricing; dedicated vector databases carry a premium that may not be justified for typical workloads.


#infrastructure#postgresql#pgvector#cost-optimization
Related

Related Articles

Ready to try Chatsy?

Build your own AI customer support agent in minutes — no code required.

Start Free Trial