Pgvector Is Now Faster Than Pinecone at 75% Less Cost

avthar · on June 11, 2024

Blog co-author here (PM at Timescale).

We're excited to release pgvectorscale. Our team built this extension to make PostgreSQL a better database for AI and to challenge the notion that PostgreSQL and pgvector are not performant for vector workloads.

pgvectorscale is open-source under the PostgreSQL license and free to use on any PostgreSQL database.

Here are two helpful companion reads to the post linked by OP: A benchmark of how PostgreSQL with pgvector and pgvectorscale performs against Pinecone [1], and a technical deep dive into pgvectorscale's StreamingDiskANN index and Statistical Binary Quantization implementations [2].

Questions and feedback welcome!

[1]: https://www.timescale.com/blog/pgvector-vs-pinecone/ [2]: https://www.timescale.com/blog/how-we-made-postgresql-as-fas...

j_not_j · on June 11, 2024

There is a real art to doing "benchmarks", more correctly called "synthetic benchmarks" since they don't reflect actual usage but are intended for comparisons.

I had tried pgvector 0.6.2 on an OCI free node (2cpu 64GB) and noticed a few things:

- pgvector build environment does NOT use -O3

- cosine indexing with/without -03 was 1h:6h elapsed time (10M 128 x fp64 table)

- memory consumption for indexing is huge, I estimated 2x table size

- you can do parts of tables (maintenance_work_mem=) substituting disk io for memory and this only doubles elapsed time

My general comment would be: prospective users need effective guidance (beyond the great advice already on the pgvector website) about memory, cpu, and disk.

I really like the pgvectorscale possibilities for faster lookups; some great ideas there.

jamesgresql · on June 11, 2024

Great comments!

This is exactly where we see ourselves contributing: both making pgvector faster and more efficient through pgvectorscale, and working to make the AI on Postgres developer experience first class.

jamesgresql · on June 11, 2024

Super excited about this launch, happy to answer any questions (or jump into our Discord for more)

The pgvectorscale repo is at: https://github.com/timescale/pgvectorscale

fhenrywells · on June 12, 2024

Did you guys have to make any considerations at construction time to make exhaustive cycling less of an issue? From what the blog post says it seems that the streaming innovations are mostly at query time but i want to make sure i understand correctly. Thanks and impressive work here!

brycelarkin · on June 12, 2024

Really cool!

When should you use pgvector vs pgvector scale?

Is there any discussion about getting this added as a supported AWS RDS extensions?

akulkarni · on June 16, 2024

pgvectorscale only makes pgvector better. The primary developer-facing improvement is the introduction of the StreamingDiskANN index type.

So we would recommend using both from the start. There is no cost (technical or financial) for doing so.

There is discussion about getting this added to AWS RDS (as well as other PostgreSQL providers), but too early to share anything.

cl42 · on June 11, 2024

Thanks for doing this. I'm a huge pgvector fan and user... After trying a number of different options (Pinecone, ChromaDB, FAISS + memory stores), I felt like pgvector offered the best value and project maturity. Performance was my biggest concern, though much of that was based on blog posts that might be FUD rather than real benchmarks.

akulkarni · on June 16, 2024

Agreed :-)