Note that Neeva wasn't exactly a small player, managing to burn through $77.5M in VC funding over the space of just 3 years[0]. Note also that, at least at the start of that 3 year period, they simply bought in results from Bing rather than build their own index[1].
Sometimes i wonder how much "business" is about getting gov. subsidies, grants or scamming the masses, then distributing wealth to friends and family horizontally to elite peers.
Amounts that seem absurd to billions of working class people disappear on the daily apparently with zero value ever created - i've seen this happen enough times IRL to realise that most business is part performance, part deception around very little actual core value, - businesstheater almost.
I've seen a lot of that crap on e.g. the Oslo Stock Exchange over the past two decades. Lots of money from local and very inexperienced (e.g. fishing/shipping/oil money) companies/people, looking to invest in tech and often randomly landing on investing in quite far out gambles where it's legitimitely really hard to figure out of there's an intent of fraud or not. The exec teams tend to draw disgustingly high comps though. Then the weird companies get insanely hyped for no reason by local "economy journalists". Then there's loads of insider leaks and trading. Oslo Stock Exchange is like the wild west.
I thought this was largely isolated to small immature markets like e.g. Norway though.
In a lot of established businesses, executives are basically just looting to enrich themselves and their friends. Wherever I've worked I've seen it, inner circle comes first, then running the business. It's double true in venture capitalism. VC buddies get installed into high paying "jobs" as part of raises, executives taking advantage of the perks every way they can. Free money corrupts. But the model I've seen again and again is someone gets access to money, they call their friends and have a party on other people's expense. It's how the world works. If you're into that sort of thing, best thing to do is figure out who is likely to have money to dole out and kiss their ass...
Building a bot to crawl all of the Internet and save it to an index is a fairly straightforwards task. As Google and Pagerank proved though, it's the algorithm you use to search that index that's valuable. Any idiot can try and run grep against said index and give 30,000 results, of which the one you want is on page 53. So writing the crawler to build the index isn't really a competitive advantage.
Why then reinvent the wheel and spend untold amount of resources re-crawling the web, when Bing will let you use theirs? What secret sauce for crawling web pages does doing your own crawl bring to the table?
You're conflating crawling with querying/ranking in a weird way. And: grep - are you serious?
(Yes, you also namedropped Pagerank for some odd reason.)
The thing is, though: You can't easily outsource the crawling and then do the quering/ranking inhouse. The reverse index and various other data structures you need are inherently tied to the data structures from the crawler output. This is a very large amount of data and it's changing often.
The outsourcing that is being done is at the "search query to results" level. That is why this is so disappointing.
[0] https://www.crunchbase.com/organization/neeva
[1] https://www.protocol.com/neeva-search