Benchmark anxiety is the leading cause of premature vector database adoption. Teams read the comparison posts, run a few synthetic tests, and schedule a migration - before retrieval has become a real production constraint. The bar for actually needing a dedicated vector store is higher than those posts imply. This is the decision framework the comparison articles skip.
The default that works
pgvector handles more workloads than the comparison articles suggest. In May 2025, JustSoftLab ran a production benchmark against Pinecone at 47 million vectors - a legal document RAG system with real query patterns on 2.3 million documents. pgvectorscale (Timescale's extension on top of pgvector) reached 471 queries per second at 99% recall. Qdrant achieved 41 QPS at the same recall threshold.
That number matters because most benchmark anxiety starts well below 47 million vectors. Teams read the comparison posts and decide they need a dedicated vector store at 100,000 embeddings. They are making an infrastructure decision based on a scale they have not reached and may not reach for another year.
The operational case for staying on Postgres is straightforward: your embeddings sit next to your user records, documents, permissions, and audit logs. Joins work. Transactions work. Backups cover everything. Access control is unified. Introducing a second database means sync logic, dual-write handling, backup coordination, and eventual consistency between two systems. That complexity is real, and it arrives on day one of the migration - not at the scale that would have justified the move.
If you want the full case for starting with Postgres, I covered it in Why PostgreSQL Is Your Best Bet for AI Projects. This article is the companion piece: what to do when the signals arrive that Postgres is no longer enough.
What the comparison articles miss
The New Stack published a piece in 2025 under the headline "Why pgvector Benchmarks Lie." The argument was not that pgvector is being misrepresented - it was that all vector database benchmarks test the wrong thing.
Isolated search throughput tests measure clean datasets with simple query patterns. Production systems are not clean. They combine vector similarity search with metadata filters, access control lists, reranking, schema evolution, retries, observability instrumentation, and query interference from everything else running on the same infrastructure. Benchmarks strip all of that out. The result looks like a performance decision. It rarely is.
A more instructive data point came from VentureBeat in Q1 2026. Enterprise intent to adopt hybrid retrieval tripled from 10.3% to 33.3% in a single quarter. The obvious reading is momentum. The less obvious reading is in the second number: 22% of qualified enterprise respondents reported having no production RAG systems at all. The vector database comparison conversation is happening several steps ahead of where most teams actually are.
There is still a version of the Qdrant vs Pinecone question worth asking once you reach it. Qdrant achieves 22ms p95 latency at 10 million vectors in clean benchmark conditions; Pinecone runs at 45ms. With complex metadata filters, both degrade. What remains after the benchmarks wash out is not a performance question - it is an operating model question.
The teams that did move fast paid a different price. VentureBeat's broader finding was that organisations which went wide on RAG in 2025 are hitting the same failure point: architectures built for document retrieval that do not hold at agentic scale. The rebuild is not a database choice problem. It is an architecture problem that a fancier database cannot fix.
The three signals that justify moving
There are three specific, observable conditions that justify moving off pgvector. Each is measurable before it becomes painful, and most teams that begin evaluating alternatives have not hit any of them.
Signal 1 - HNSW index pressure on non-vector queries
When your HNSW index starts causing visible p95 degradation on queries that have nothing to do with vector search - standard reads, writes, joins - the vector workload is competing for memory and CPU with the rest of your application. You will see this in query plan times before your users notice it.
Before treating this as a migration trigger: most teams that see this signal have not tuned their HNSW parameters.
Adjusting ef_search and m recovers 40 to 60% of lost performance in the majority of cases.
Tune before you migrate.
-- Tune before you migrate — most teams skip this step entirely-- ef_search: higher = better recall, slower queries (default: 40)-- m: higher = better recall, more memory at index build time (default: 16)
-- Step 1: test ef_search in your session first (no index rebuild needed)SET LOCAL hnsw.ef_search = 100; -- try values between 80–200
SELECT id, content, 1 - (embedding <=> $1::vector) AS similarityFROM documentsORDER BY embedding <=> $1::vectorLIMIT 10;
-- Step 2: if recall improves enough, persist it at the connection levelALTER ROLE your_app_user SET hnsw.ef_search = 100;
-- Step 3: if you still need more, rebuild the index with a higher m-- Warning: this locks the table briefly and takes time on large corporaDROP INDEX CONCURRENTLY IF EXISTS documents_embedding_idx;CREATE INDEX CONCURRENTLY documents_embedding_idx ON documents USING hnsw (embedding vector_cosine_ops) WITH (m = 32, ef_construction = 128);Signal 2 - Selective metadata filters degrading retrieval quality
This is where pgvector's limits show up earliest in production RAG systems. When filtering becomes central to retrieval - not just an occasional condition but the primary way you scope results - pgvector's combined filter and vector performance degrades in ways that are difficult to tune away. Query plans show full-index scans despite indexes on the filter columns. The vector index and the row filter cannot cooperate efficiently.
Qdrant handles this natively via payload indexing: the filter is applied at the vector index level rather than post-hoc. If your RAG system narrows results by tenant, date range, document type, or access tier on every query, you have likely already hit this limit or are close to it.
# pgvector: filter is applied AFTER the vector index traversal (post-hoc)# With selective filters, this forces a near full-index scan# EXPLAIN ANALYZE will show "Index Scan" becoming "Seq Scan" as filters tighten
results = await db.fetch(""" SELECT id, content, 1 - (embedding <=> $3::vector) AS similarity FROM documents WHERE tenant_id = $1 -- selective filter: ~0.1% of rows AND doc_type = $2 -- selective filter: ~5% of rows ORDER BY embedding <=> $3::vector LIMIT 10""", tenant_id, doc_type, query_embedding)
# ─────────────────────────────────────────────────────────────────────# Qdrant: filter is applied INSIDE the vector index traversal# payload indexes narrow the candidate set before ANN search begins# no full-index scan — consistently sub-30ms even with selective filters
from qdrant_client import QdrantClientfrom qdrant_client.models import Filter, FieldCondition, MatchValue
results = client.search( collection_name="documents", query_vector=query_embedding, query_filter=Filter( must=[ FieldCondition(key="tenant_id", match=MatchValue(value=tenant_id)), FieldCondition(key="doc_type", match=MatchValue(value=doc_type)), ] ), limit=10,)The payload index needs to be created explicitly on the fields you filter most:
from qdrant_client import QdrantClientfrom qdrant_client.models import PayloadSchemaType
client = QdrantClient(url="http://localhost:6333")
# Create payload indexes on the fields central to your filters# Qdrant uses these during ANN traversal — not as a post-search stepclient.create_payload_index( collection_name="documents", field_name="tenant_id", field_schema=PayloadSchemaType.KEYWORD,)client.create_payload_index( collection_name="documents", field_name="doc_type", field_schema=PayloadSchemaType.KEYWORD,)client.create_payload_index( collection_name="documents", field_name="created_at", field_schema=PayloadSchemaType.DATETIME,)Signal 3 - Vector workload delaying writes
The most visible signal, and usually the last to arrive. When search requests start adding latency to write operations on the same instance, the retrieval workload has grown large enough to create resource contention. This is the clearest argument for workload isolation.
If you are waiting for Signal 3 before evaluating alternatives, you have probably been living with Signal 1 and Signal 2 for a while. Use the earlier signals as the trigger to evaluate, not the later one.
If you do need to move: Qdrant vs Pinecone
Once the production signals arrive and a migration is justified, the Qdrant versus Pinecone choice is not a performance decision. Qdrant is faster in clean benchmarks - 22ms p95 versus Pinecone's 45ms at 10 million vectors, roughly twice as fast on indexing. But by the time this choice matters, raw query speed is rarely the deciding constraint.
The real question is infrastructure ownership.
Qdrant is open-source, written in Rust, and deployable anywhere. Qdrant Cloud managed pricing runs around $65 per month at 10 million vectors and approximately $130 per month at 50 million vectors. At 50 million vectors and above, Qdrant Cloud saves roughly 32% versus Pinecone Serverless. (LeanOps, 2026)
Pinecone is fully managed, charging per storage, read unit, and write unit. Around $70 per month at 10 million vectors under normal query load. The important caveat: Ranksquire (2026) found that production bills at sustained agent load run 3-5× above calculator estimates - write unit saturation and capacity fees activate silently once query concurrency climbs. The value Pinecone delivers is not search performance - it is the engineering time your team does not spend running a database. That trade-off changes when you introduce agentic workloads.
The decision rule: can your team operate self-hosted infrastructure at scale? If yes - and you have engineers who have done it - Qdrant's economics are compelling past 5 million vectors. If no, Pinecone's premium buys something real.
One additional context worth noting: the standalone vector database category is under pressure. VentureBeat (Q1 2026) reported that Weaviate, Milvus, Pinecone, and Qdrant are all losing adoption share to custom stacks and provider-native retrieval options. This does not make them wrong choices today. It is worth knowing that the category may look different in two years - and that building on Postgres keeps your options open longer.
Decision table
| Situation | Best fit | Why |
|---|---|---|
| You already run Postgres and are early in the product lifecycle | pgvector | Lowest complexity, easiest integration, handles most RAG workloads to 50M+ vectors with proper tuning |
| Retrieval is production-critical and metadata filtering is central to every query | Qdrant | Payload indexing solves the filter+vector performance problem natively; strong cost efficiency at scale |
| You want managed scale and minimal infrastructure ownership | Pinecone | Absorbs operational overhead; right choice when engineering time is worth more than cost savings |
| You are optimising a benchmark rather than a production workload | None yet | Real workload shape matters more than isolated test results. Ship first, then instrument. |
Single-sentence rule: start with pgvector, move to Qdrant when retrieval complexity grows and your team can operate self-hosted infrastructure, and use Pinecone when operational simplicity is worth the premium.
Three questions before you migrate
Before scheduling a migration, answer three questions with instrumentation - not intuition.
Is Postgres actually the bottleneck, or are you uncomfortable tuning it?
Run EXPLAIN ANALYZE on your slowest vector queries first.
The instinct to buy a solution is strong when the alternative is debugging index configuration at 11pm.
The database is usually not the problem.
-- Run this on your slowest vector query before considering a migration-- Look for: Seq Scan instead of Index Scan, high Buffers hit, high cost estimates
EXPLAIN (ANALYZE, BUFFERS, FORMAT TEXT)SELECT id, content, 1 - (embedding <=> $1::vector) AS similarityFROM documentsWHERE tenant_id = $2 AND created_at > NOW() - INTERVAL '90 days'ORDER BY embedding <=> $1::vectorLIMIT 10;
-- Red flags in the output:-- "Seq Scan on documents" with large row estimates → filter is killing the index-- "Buffers: shared hit=84721" → large memory pressure-- "Execution Time: 2300ms" → well above your SLA---- If you see these: tune ef_search and m before moving databases.-- Most teams resolve Signal 1 entirely through index configuration.Are filters, scale, or latency actually forcing a different architecture? Identify the specific query pattern that is failing. If you cannot point to a query in production that is measurably failing, you do not have a production problem yet. You have a forecast. Forecasts are not migration triggers.
Would a dedicated database reduce complexity, or just move it somewhere else? A migration introduces sync logic, dual-write handling, and new failure modes. If the problem is selective metadata filtering, Qdrant solves it cleanly. If the problem is unclear, adding a second database is adding complexity to solve uncertainty.
If the answers to all three are unclear, stay on pgvector longer. That is almost always the correct engineering move.
The most expensive vector database decision is the one you make before you have a production problem. Start with Postgres. Tune it. Ship. Then let the actual pressure tell you when it is time to move - because it will.
Sources
- JustSoftLab (May 2025) - Postgres + pgvector vs Pinecone: A Production Benchmark to 50M Vectors: justsoftlab.com
- VentureBeat (Q1 2026) - The Retrieval Rebuild: Why Hybrid Retrieval Intent Tripled as Enterprise RAG Programs Hit the Scale Wall: venturebeat.com
- LeanOps (2026) - Qdrant Cloud Pricing 2026: Saves 32% vs Pinecone at 50M+: leanopstech.com
- Ranksquire (2026) - Vector Database Pricing Comparison 2026: Real Cost Breakdown: ranksquire.com
- MyEngineeringPath (2026) - Pinecone vs Qdrant: Managed Ease or Open-Source Speed?: myengineeringpath.dev
- The New Stack (2025) - Why pgvector Benchmarks Lie: thenewstack.io
Working through the challenges in this post? I help engineering leaders and CTOs navigate complex technical decisions and scale high-performing teams. Schedule a consultation →