Very cool. Being able to use semantic (as opposed to syntactic) operators like `==`, `+`, etc. feels like fertilizer for some novel ideas. Sort of like when word embeddings first came out and there was a loose concept algebra introduced with it ("King - Man + Woman = Queen").
That said the neuro + symbolic integration here is, like most systems, pretty shallow/firewalled (taxonomically, Type 3 / Neuro;Symbolic — https://harshakokel.com/posts/neurosymbolic-systems). I think the real magic is going to come when we start heading toward a much more fundamental integration. We're actually working on this at my company (https://onton.com). How do we create a post-LLM system that:
1) features an integrated representation (neither purely symbolic nor dense floating point matrix); 2) can learn incrementally from small amounts of noisy data, without being subject to catastrophic forgetting; 3) can perform mathematical and other symbolic operations with bulletproof reliability; and 4) is hallucination-free?
The cobbling together of existing systems hot-glue style is certainly useful, but I think a unified architecture is going to change everything.
We employ a knowledge graph at Deft (https://shopdeft.com) to enable searches over ~1M products, amounting to about 1B triples. Because of the complexity of the queries involved, the expressiveness of our data model — supporting n-ary/reified relations, negation, disjunction, linguistic vagueness, etc. — and our real-time latency targets, we built a graph DB engine "from scratch" (certain components are of course from open-source projects). Even RedisGraph wasn't fast enough for the purpose; ours (Deftgraph) is 700x faster on our queries thanks to some SOTA optimizations from various recent papers. You'll notice on our site that the overall search latency is generally acceptable but not great; the vast proportion of that latency comes from 1) LLMs and 2) a less-optimized other graph DB, Datomic, that we still store some of our data in for legacy reasons.
LLMs are great, but knowledge graphs are IMO indispensable to tame their shortcomings.
If you have a graph database that is 700x faster on real world use cases than the next nearest competitor, why aren't you selling it? Given the current AI gold rush, it seems like a no brainer to get some VC cash, hire some sales people, and start selling shovels.
We had a similar problem, Datomic/Datascript not having an open format like RDF, but RDF being clunky and slow, so we build our own open-source solution in Rust (https://github.com/triblespace).
On an M1max we're currently at ~3us per query for a single result (so essentially per query overhead), and have something like 1m QRPS for queries with 3-4 joins.
I'm curious if you've somehow managed to shave off another order of magnitude, as I suspect that most WCO joins will be similarly limited by memory bandwidth.
We for example worked out a novel join algorithm family (Atreides Join) and supporting trie based in-memory and succinct zero-copy on-disk data-structures, just to get rid of the query optimiser and its massive constant factor.
Not sure what you mean. To become a Pioneer, sign up (pioneer.app), rise up the leaderboard by making progress on your startup, and make the top 50 to be eligible to be selected. Not really a pitch-your-startup process to it.
This is for things that can be done in a few weeks. That feels like something that made more sense in the early days of the web or the early days of phone apps. Those fields are very crowded now.
Pioneer made all the difference for us! It really gives you an amazing amount of focus and energy to do the best thing for your startup, which is what should be obvious but isn't: to make actual progress (i.e. make something users want, not e.g. do startup competitions or get featured in the press). That's the one metric you're evaluated on — progress — which is the only one that matters to the life of your startup anyway.
Thanks so much to Daniel and everyone else for making Pioneer a great experience!
P.S. — My cofounder and I are building Deft (shopdeft.com), which is product search for humans, not robots. You can type how you speak, search by image, and search over every product site!
That said the neuro + symbolic integration here is, like most systems, pretty shallow/firewalled (taxonomically, Type 3 / Neuro;Symbolic — https://harshakokel.com/posts/neurosymbolic-systems). I think the real magic is going to come when we start heading toward a much more fundamental integration. We're actually working on this at my company (https://onton.com). How do we create a post-LLM system that: 1) features an integrated representation (neither purely symbolic nor dense floating point matrix); 2) can learn incrementally from small amounts of noisy data, without being subject to catastrophic forgetting; 3) can perform mathematical and other symbolic operations with bulletproof reliability; and 4) is hallucination-free?
The cobbling together of existing systems hot-glue style is certainly useful, but I think a unified architecture is going to change everything.