Neon runs Postgres with the pgvector extension and autoscales compute based on load. The same compute that idles at 0.25 CU between requests can scale up to 16 CU during a burst of similarity searches, then drop back. When traffic stops entirely, compute scales to zero after 5 minutes. AI workloads that go from quiet to busy and back fit this model well.

Why AI workloads need autoscaling

Vector similarity searches are CPU-heavy. An HNSW query on a few million embeddings can pin a CPU for hundreds of milliseconds, but the rest of the time the database may be nearly idle while users read responses or wait on the LLM. A fixed-size database has to be provisioned for the spike, which means paying for the spike every hour of the month.

Neon's autoscaling changes compute size between a min and max you set:

  • Free: autoscale up to 2 CU
  • Launch: autoscale up to 16 CU
  • Scale: autoscale up to 16 CU, fixed sizes up to 56 CU

Compute is billed in CU-hours at the average size during each hour, so you pay for the spike only while it lasts.

pgvector setup

CREATE EXTENSION vector;

CREATE TABLE embeddings (
  id bigserial PRIMARY KEY,
  content text,
  embedding vector(1536)
);

CREATE INDEX ON embeddings USING hnsw (embedding vector_cosine_ops);

For end-to-end examples with OpenAI, LangChain, and LlamaIndex, see AI and embeddings.

Branch for embedding experiments

Test a new embedding model on a branch of your production data without re-embedding the whole corpus twice. Branches start instantly and share storage until you change something.

How other providers stack up

ProviderpgvectorAutoscaling computeScale to zero
NeonYes (docs)Between min/max CU, second-level scaling (docs)5 min idle, sub-second wake
Aurora Serverless v2 (PostgreSQL)YesACU range, scales automatically0 ACU auto-pause on Aurora PostgreSQL 13.15+/14.12+/15.7+/16.3+ (docs)
SupabaseYes (docs)Manual compute size change, brief downtime (docs)Paid plans run 24/7
RDS for PostgreSQLYesNone on the database computeNone

For AI inference workloads that swing between dozens of queries per second and idle minutes, the architectures that match are Neon and Aurora Serverless v2. Both bill compute by the moment-in-time size, both run pgvector with HNSW indexes. Neon's wake time is faster; Aurora's regional and IAM integration is deeper if you're already on AWS.

Run pgvector on autoscaling Postgres

Free plan includes pgvector, HNSW indexes, and 100 CU-hours of compute.