Hacker Newsnew | past | comments | ask | show | jobs | submitlogin
Pgvector Is Now Faster Than Pinecone at 75% Less Cost (timescale.com)
127 points by sh_tomer on June 11, 2024 | hide | past | favorite | 9 comments


Blog co-author here (PM at Timescale).

We're excited to release pgvectorscale. Our team built this extension to make PostgreSQL a better database for AI and to challenge the notion that PostgreSQL and pgvector are not performant for vector workloads.

pgvectorscale is open-source under the PostgreSQL license and free to use on any PostgreSQL database.

Here are two helpful companion reads to the post linked by OP: A benchmark of how PostgreSQL with pgvector and pgvectorscale performs against Pinecone [1], and a technical deep dive into pgvectorscale's StreamingDiskANN index and Statistical Binary Quantization implementations [2].

Questions and feedback welcome!

[1]: https://www.timescale.com/blog/pgvector-vs-pinecone/ [2]: https://www.timescale.com/blog/how-we-made-postgresql-as-fas...


There is a real art to doing "benchmarks", more correctly called "synthetic benchmarks" since they don't reflect actual usage but are intended for comparisons.

I had tried pgvector 0.6.2 on an OCI free node (2cpu 64GB) and noticed a few things:

- pgvector build environment does NOT use -O3

- cosine indexing with/without -03 was 1h:6h elapsed time (10M 128 x fp64 table)

- memory consumption for indexing is huge, I estimated 2x table size

- you can do parts of tables (maintenance_work_mem=) substituting disk io for memory and this only doubles elapsed time

My general comment would be: prospective users need effective guidance (beyond the great advice already on the pgvector website) about memory, cpu, and disk.

I really like the pgvectorscale possibilities for faster lookups; some great ideas there.


Great comments!

This is exactly where we see ourselves contributing: both making pgvector faster and more efficient through pgvectorscale, and working to make the AI on Postgres developer experience first class.


Super excited about this launch, happy to answer any questions (or jump into our Discord for more)

The pgvectorscale repo is at: https://github.com/timescale/pgvectorscale


Did you guys have to make any considerations at construction time to make exhaustive cycling less of an issue? From what the blog post says it seems that the streaming innovations are mostly at query time but i want to make sure i understand correctly. Thanks and impressive work here!


Really cool!

When should you use pgvector vs pgvector scale?

Is there any discussion about getting this added as a supported AWS RDS extensions?


pgvectorscale only makes pgvector better. The primary developer-facing improvement is the introduction of the StreamingDiskANN index type.

So we would recommend using both from the start. There is no cost (technical or financial) for doing so.

There is discussion about getting this added to AWS RDS (as well as other PostgreSQL providers), but too early to share anything.


Thanks for doing this. I'm a huge pgvector fan and user... After trying a number of different options (Pinecone, ChromaDB, FAISS + memory stores), I felt like pgvector offered the best value and project maturity. Performance was my biggest concern, though much of that was based on blog posts that might be FUD rather than real benchmarks.


Agreed :-)




Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: