When we started Chatsy, we chose Pinecone for vector storage. It was the obvious choice — purpose-built for vector search, great developer experience, and excellent performance.

But as we scaled, our Pinecone bill grew from $100/month to over $3,000/month. We knew there had to be a better way.

The Decision to Migrate

Our requirements were clear:

Performance: Sub-100ms query latency at the 95th percentile
Scale: Support for 10M+ vectors
Cost: Significant reduction from $3,000/month
Reliability: 99.9% uptime SLA

After evaluating options (Weaviate, Milvus, Qdrant, pgvector), we chose pgvector — the PostgreSQL extension for vector similarity search.

Why pgvector?

1. Unified Data Layer

With pgvector, our vectors live alongside our relational data. No more syncing between databases. One source of truth.

sql
SELECT content, embedding <=> query_embedding AS distance
FROM documents
WHERE chatbot_id = $1
ORDER BY distance
LIMIT 10;

2. Mature Ecosystem

PostgreSQL has 30+ years of battle-tested reliability. We get:

ACID transactions
Point-in-time recovery
Connection pooling (PgBouncer)
Mature monitoring tools

3. Cost Efficiency

Our new infrastructure costs $90/month on a managed PostgreSQL instance. That's a 97% reduction.

The Migration Process

Phase 1: Schema Design

We added a vector column to our existing documents table:

sql
ALTER TABLE documents 
ADD COLUMN embedding vector(1536);

CREATE INDEX ON documents 
USING ivfflat (embedding vector_cosine_ops)
WITH (lists = 100);

Phase 2: Backfill

We wrote a migration script to copy vectors from Pinecone:

typescript
async function backfillVectors() {
  const pineconeVectors = await pinecone.fetch({ ids: documentIds });
  
  for (const [id, vector] of Object.entries(pineconeVectors)) {
    await prisma.document.update({
      where: { id },
      data: { embedding: vector.values }
    });
  }
}

Phase 3: Dual-Write

During the transition, we wrote to both databases:

typescript
async function indexDocument(doc: Document, embedding: number[]) {
  // Write to both
  await Promise.all([
    pinecone.upsert([{ id: doc.id, values: embedding }]),
    prisma.document.update({
      where: { id: doc.id },
      data: { embedding }
    })
  ]);
}

Phase 4: Cutover

After validating pgvector results matched Pinecone, we switched reads:

typescript
// Before
const results = await pinecone.query({ vector, topK: 10 });

// After
const results = await prisma.$queryRaw`
  SELECT id, content, embedding <=> ${vector}::vector AS distance
  FROM documents
  WHERE chatbot_id = ${chatbotId}
  ORDER BY distance
  LIMIT 10
`;

Performance Results

Metric	Pinecone	pgvector	Change
P50 Latency	45ms	38ms	-16%
P95 Latency	120ms	85ms	-29%
P99 Latency	250ms	150ms	-40%
Monthly Cost	$3,000	$90	-97%

Yes, pgvector is actually faster for our use case. The co-location of vector and metadata eliminates network round trips.

Lessons Learned

Start with IVFFlat, consider HNSW: IVFFlat is simpler and works great up to ~1M vectors. HNSW is better for larger scales but uses more memory.
Tune your lists parameter: Too few lists = slow queries. Too many = slow inserts. We settled on sqrt(num_vectors).
Use partial indexes: If you filter by tenant, create partial indexes per tenant for dramatic speedups.
Monitor vacuum: Vector columns are large. Aggressive vacuuming prevents bloat.

Conclusion

The migration took 3 weeks of engineering time and saved us $35,000/year. More importantly, it simplified our architecture and improved performance.

Not every workload is suitable for pgvector — if you need billion-scale vectors with sub-10ms latency, dedicated vector databases still make sense. But for most applications, pgvector is the pragmatic choice.

See Our Tech Stack →

Why We Migrated from Pinecone to pgvector: A 97% Cost Reduction Story